, 2020:33

# The law of iterated logarithm for the estimations of diffusion-type processes

• Mingzhi Mao
• Gang Huang
Open Access
Research

## Abstract

This paper mainly discusses the asymptotic behaviours on the lasso-type estimators for diffusion-type processes with a small noise. By constructing the objective function on the estimation, in view of convexity argument, it is proved that the estimator for different values of γ satisfies the iterated logarithm law. The result also presents the exponential convergence principle for the estimator converging to the true value.

## Keywords

The law of iterated logarithm Diffusion-type processes Approach of argmins Objective function

## MSC

60J60 60F05 60J25 60F10

## 1 Introduction

Let $$\{X_{t}^{\epsilon}\}_{0\leq t\leq T}$$ be the solution of the following stochastic diffusion-type process:
$$\begin{gathered} dX_{t}^{\epsilon}=S_{t} \bigl(\theta,X^{\epsilon}\bigr)\,dt +\epsilon\,dW_{t}, \\ X_{0}^{\epsilon}=x_{0} ,\end{gathered}$$
(1)
where $$x_{0}$$ is a fixed constant, $$\theta=(\theta_{1},\ldots,\theta _{p})^{\prime}\in\varTheta\subset\mathbb{R}^{p}$$ is an unknown parameter, $$S_{t}(\theta,x)$$ is a known measurable, nonanticipative functions such that (1) has a strong unique solution, $$W_{t}$$ is a standard Wiener process, which is usually called random noise, and $$\epsilon\in(0,1]$$ the diffusion coefficient.
There is a rich literature on the methods to estimate parameter θ, such as least squares estimation, Bayesian estimation, maximum likelihood estimation, and so on. However, there is no uniform standard to comment on those methods. The limiting properties for various estimators attract the attention of statisticians because of their applicability in mathematical finance, biology and other fields; see [5, 16, 21]. As described by Kutoyants [16], the minimum distance estimation is a relatively new method compared with traditional estimation methods, which deals with less stochastic calculations and robustness. This paper considers a class of minimum distance estimators for the diffusion process (1) and discusses their exponential convergence principle. To estimate the unknown parameter θ, we need to introduce the ordinary differential equation:
$$dX_{t}^{0}=S_{t}\bigl( \theta,X^{0}\bigr)\,dt,\quad 0\leq t\leq T,$$
(2)
with initial condition $$X_{0}^{0}=x_{0}$$.
The minimum distance estimator $$\widehat{\theta}^{\epsilon}$$ is given by
\begin{aligned}[b] \widehat{\theta}^{\epsilon}&= \mathop{\operatorname {argmin}}_{\theta\in\overline{\varTheta }} \bigl\Vert X^{\epsilon}-X^{0} \bigr\Vert ^{2} \\ &=\mathop{\operatorname {argmin}}_{\theta\in\overline{\varTheta}} \int^{T}_{0} \bigl(X_{t}^{\epsilon}-X^{0}_{t} \bigr)^{2}\alpha(dt), \end{aligned}
(3)
where Θ̅ is the closure of Θ, α is some finite measure on $$[0,T]$$ and $$\operatorname {argmin}g=\{x:g(x)=\inf g\}$$. Kutoyants discussed the consistency and asymptotic normality of estimator $$\widehat{\theta}^{\epsilon}$$ in [17, 18]. Dietz and Kutoyants [7] considered a class of minimum distance estimators for diffusion processes with ergodic properties. For general minimum distance estimators, the reader can refer to Kutoyants [16] and references therein. Recently, Zhao and Zhang [22] studied the minimum distance parameter estimation for stochastic differential equations with small α-stable noises.
However, few works considered the convergence rate of $$\widehat{\theta }^{\epsilon}$$ converging to the true value of θ. This induces some stochastic dynamical systems not to be reasonably identified by finite observations of spaced time points. What’s more, from the point of view of probability, the law of large numbers, the central limit theorem, and the iterated logarithm law are all contained in the limit theory, which is a whole system. This motivates us to study the convergence rate of $$\widehat{\theta}^{\epsilon}\rightarrow\theta^{*}$$ ($$\theta^{*}$$ is the true value of θ). In order to generalize the model, we use the constrained minimum distance estimator based on $$L_{\gamma}$$-penalized function contrast:
\begin{aligned} Z_{\epsilon}(\theta)&= \int_{0}^{T} \bigl(X_{t}^{\epsilon}-X^{0}_{t} \bigr)^{2}\alpha (dt)+\lambda_{\epsilon}\sum _{j=1}^{p} \vert \theta_{j} \vert ^{\gamma}, \end{aligned}
(4)
where $$\gamma>0$$ is a fixed constant and $$\lambda_{\epsilon}>0$$ is a penalty parameter with respect to ϵ. Without loss of generality, we assume that the trend functional of the processes (1) is of integral type:
$$S_{t}\bigl(\theta,X^{\epsilon}\bigr)=V\bigl(\theta,t,X^{\epsilon}\bigr)+ \int^{t}_{0}K\bigl(\theta ,t,s,X^{\epsilon}_{s} \bigr)\,ds.$$
Denote the true value of θ by $$\theta^{*}$$ and the lasso-type estimator of θ by
$$\widehat{\theta}^{\epsilon}=\mathop{\operatorname {argmin}}_{\theta\in\overline{\varTheta }}Z_{\epsilon}(\theta).$$
(5)
In this paper, we will discuss the limit behaviors of $$\frac{\widehat {\theta}^{\epsilon}-\theta^{*}}{\epsilon\sqrt{2\log\log(\epsilon ^{-1}\vee3)}}$$, i.e., the iterated logarithm law. This shows that the estimator $$\widehat{\theta}^{\epsilon}$$ converges to a constant almost everywhere with an exponential convergence rate. We recall that Gregorio and Iacus showed that $$\widehat{\theta}^{\epsilon}$$ satisfies the central limit theorem in [11], that is,
$$\epsilon^{-1} \bigl(\widehat{\theta}^{\epsilon}-\theta^{*} \bigr) \Rightarrow\mathop{\operatorname {argmin}}_{u}V(u),$$
where ⇒ denotes convergence in distribution, $$V(u)$$ is a fixed random function. Our result also can be considered a supplement of Gregorio and Iacus’ work.

## 2 Preliminaries

This section will present some basic notations and assumptions which will be used in the paper. Define the inner product by $$\langle x,y\rangle=\sum_{i=1}^{p}x_{i}y_{i}$$ in the space $$\mathbb{R}^{p}$$. In particular, use $$|\cdot|$$ for the Euclidean distance, that is, $$|y|=\sqrt{y^{\prime}y}=\sqrt{\sum_{i=1}^{p}\langle x_{i},x_{i}\rangle }$$, where $$y^{\prime}$$ denotes the transpose of y. Let B be some Banach space and write $$\|\cdot\|$$ for the corresponding norm. If B is the space of all continuous bounded functions on $$\mathbb{R}^{p}$$, we always define $$\|f\|=\sup_{x\in\mathbb {R}^{p}}|f(x)|$$ for any $$f\in\mathbf{B}$$. Let $$D(\mathbb{T})$$ the space of càdlàg functions (i.e., right continuous with left limits) on $$\mathbb{T}$$ with the Skorohod topology. For any set $$A\subset\mathbb{R}^{p}$$, we define the distance from $$x\in\mathbb{R}^{p}$$ to A by $$\rho(x,A)=\inf_{y\in A}\rho (x,y)$$. If $$\{x^{\epsilon}\}$$ is a suitable family of points in $$\mathbb {R}^{p}$$, then let $$\mathbf{C}(\{x^{\epsilon}\})$$ denote the cluster set of $$\{x^{\epsilon}\}$$. That is, $$\mathbf{C}(\{x^{\epsilon}\})$$ are all possible limit points of the sequence $$\{x_{n}\}$$. We sometimes use the notation $$\lim_{\epsilon\rightarrow0}x^{\epsilon}=A$$ if both $$\lim_{\epsilon\rightarrow0}\rho(x^{\epsilon},A)=0$$ and $$\mathbf{C}(\{ x^{\epsilon}\})=A$$. Throughout the paper, let $$\mathbf{P}_{\theta}$$ denote the law of $$X_{t}(\theta)$$ under parameter θ. The subscript θ indicates that the process $$X_{t}^{\epsilon}(\theta)$$ depends on θ. If it doesn’t cause confusion, we always omit θ, i.e., $$X_{t}^{\epsilon}=X_{t}^{\epsilon}(\theta)$$.

Define $$V_{x}(\theta,t,x)=\frac{\partial}{\partial x}V(\theta,t,x)$$ and $$K_{x}(\theta,t,s,x)=\frac{\partial}{\partial x}K(\theta,t,s,x)$$. Let $$Y_{t}=\{Y_{t}(\theta), 0\leq t\leq T\}$$ be the solution of a diffusion-type process
$$d Y_{t}=\biggl(V_{x}\bigl(\theta,t,X^{0}(\theta)\bigr)Y_{t}+\int_{0}^{t}K_{x}\bigl(\theta,t,s,X_{s}^{0}(\theta)\bigr)Y_{s}\,ds\biggr)\,dt+dW_{t}$$
(6)
with initial condition $$Y_{0}=0$$. The process $$Y_{t}$$ plays a central role in the study of the asymptotic distribution of the estimators in the theory of diffusion process $$X_{t}$$ with small noise. Denote the p-dimensional vector of partial derivatives of $$X_{t}^{0}(\theta)$$ with respect to $$\theta_{j}$$ ($$j=1,\ldots,p$$) by $$\dot{X}_{t}^{0}(\theta)$$, that is,
\begin{aligned} [b]\dot{X}_{t}^{0}(\theta)&= \frac{\partial X_{t}^{0}(\theta)}{\partial\theta } \\ &= \biggl(\frac{\partial}{\partial\theta_{1}}X_{t}^{0}(\theta),\ldots , \frac{\partial}{\partial\theta_{p}}X_{t}^{0}(\theta) \biggr)^{\prime}. \end{aligned}
(7)
It is easy to see that $$\dot{X}_{t}^{0}(\theta)$$ satisfies the following differential equation:
\begin{aligned}[b] \frac{d\dot{X}_{t}^{0}(\theta)}{dt}={}&V_{x}\bigl(\theta,t,X_{t}^{0}(\theta)\bigr)\dot{X}_{t}^{0}(\theta)+\dot{V}\bigl(\theta,t,X_{t}^{0}(\theta)\bigr) \\ &{}+\int_{0}^{t}\bigl(\dot{K}\bigl(\theta,t,s,X_{s}^{0}(\theta)\bigr)+K_{x}\bigl(\theta,t,s,X_{s}^{0}(\theta)\bigr)\dot{X}^{0}_{s}(\theta)\bigr)\, ds \end{aligned}
(8)
and $$\dot{X}_{0}^{0}(\theta)=0$$, where $$\dot{V}(\theta,t,X_{t}^{0}(\theta ))= (\frac{\partial}{\partial\theta_{1}}V(\theta,t,X_{t}^{0}(\theta )),\ldots,\frac{\partial}{\partial\theta_{p}}V(\theta ,t,X_{t}^{0}(\theta)) )^{\prime}$$.
We suppose that the following regular conditions for the trend coefficient $$V(\theta,t,x)$$ and $$K(\theta,t,s,x)$$ hold:
(A1)

$$\epsilon^{-1}\lambda_{\epsilon}\rightarrow\lambda_{0}\geq0$$;

(A2)
for any $$t\in[0,T]$$,
$$\sup_{\theta\in\varTheta,x\in\mathbb{R}^{p}} \biggl\vert \frac{\partial }{\partial\theta}V(\theta,t,x) \biggr\vert < \infty,\qquad \sup_{\theta\in \varTheta,x\in\mathbb{R}^{p}} \biggl\vert \frac{\partial}{\partial x}V( \theta ,t,x) \biggr\vert < \infty$$
and
$$\sup_{s,t\in[0,T]}\sup_{\theta\in\varTheta,x\in\mathbb{R}^{p}} \biggl\vert \frac{\partial}{\partial\theta}K(\theta,t,s,x) \biggr\vert < \infty, \qquad \sup _{s,t\in[0,T]}\sup_{\theta\in\varTheta,x\in\mathbb{R}^{p}} \biggl\vert \frac{\partial}{\partial x}K(\theta,t,s,x) \biggr\vert < \infty;$$
(A3)
there exist two positive constants $$\mathcal{M}_{1}$$ and $$\mathcal{M}_{2}$$ such that
$$\begin{gathered} \sup_{t\in[0,T],\theta\in\varTheta} \biggl\vert \frac{\partial}{\partial x}V(\theta,t,x)- \frac{\partial}{\partial x}V(\theta,t,y) \biggr\vert < \mathcal{M}_{1} \vert x-y \vert , \\ \sup_{t,s\in[0,T],\theta\in\varTheta} \biggl\vert \frac{\partial}{\partial x}K(\theta,t,s,x)- \frac{\partial}{\partial x}K(\theta,t,s,y) \biggr\vert < \mathcal{M}_{1} \vert x-y \vert \end{gathered}$$
and
$$\begin{gathered} \sup_{t,s\in[0,T]}\sup_{x\in\mathbb{R}^{p}} \biggl\vert \frac{\partial }{\partial x}V(\theta_{1},t,s,x)-\frac{\partial}{\partial x}V(\theta _{2},t,s,x) \biggr\vert \leq\mathcal{M}_{2} \vert \theta_{1}-\theta_{2} \vert , \\ \sup_{t,s\in[0,T]}\sup_{x\in\mathbb{R}^{p}} \biggl\vert \frac{\partial }{\partial x}K(\theta_{1},t,s,x)-\frac{\partial}{\partial x}K(\theta _{2},t,s,x) \biggr\vert \leq\mathcal{M}_{2} \vert \theta_{1}-\theta_{2} \vert .\end{gathered}$$
One can check that under conditions (A2) and (A3), the stochastic differential Eq. (1) satisfies Condition $$\mathcal{L}$$, that is,
$$\begin{gathered} \bigl\vert V(\theta,t,X_{t})-V( \theta,t,Y_{t}) \bigr\vert + \bigl\vert K(\theta ,t,s,X_{t})-K(\theta,t,s,Y_{t}) \bigr\vert \\ \quad\leq L_{1} \int_{0}^{t} \vert X_{s}-Y_{s} \vert \, dK_{s}+L_{2} \vert X_{t}-Y_{t} \vert , \\ \bigl\vert V(\theta,t,X_{t}) \bigr\vert + \bigl\vert K( \theta,t,s,X_{t}) \bigr\vert \leq L_{1} \int_{0}^{t}\bigl(1+ \vert X_{s} \vert \bigr)\,dK_{s}+L_{2}\bigl(1+ \vert X_{t} \vert \bigr), \end{gathered}$$
where $$L_{1}$$ and $$L_{2}$$ are two positive constants and $$K_{s}$$ is a nondecreasing right-continuous function, $$0\leq K_{t}\leq K_{0}$$, $$K_{0}>0$$. By virtue of Theorem 4.6 of [19, 20], there exists a unique $$D([0,T],\mathbb{R}^{p})$$-value strong solution for Eq. (1) under conditions (A2) and (A3) (the reader can also see [8] for the theory on existence and uniqueness). In Lemma 4 below, we will show that the deterministic diffusion process $$X_{t}^{0}(\theta)$$ is differentiable with respect to θ at the point $$\theta^{*}$$ in $$L_{2}$$-norm under conditions (A2) and (A3), i.e.,
$$\int_{0}^{T} \biggl(X_{t}^{0} \bigl(\theta^{*}+h\bigr)-X_{t}^{0}\bigl(\theta^{*} \bigr)-h^{\prime}\frac {\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}} \biggr)^{2} \alpha(dt)=o\bigl( \vert h \vert \bigr).$$
In the book of Kutoyants [16], the reader can see that the assumption is very important for the statistical identification problems. We now introduce the objective function of $$\widehat{\theta^{\epsilon}}-\theta^{*}$$:
\begin{aligned} V_{\epsilon}(u)&=\frac{1}{\epsilon^{2}h(\epsilon)} \Biggl[ \int_{0}^{T} \bigl(X_{t}^{\epsilon}-X_{t}^{0} \bigl(\theta^{*}+h(\epsilon)\epsilon u\bigr) \bigr)^{2}\alpha (dt) \\ & - \int_{0}^{T} \bigl(X_{t}^{\epsilon}-X_{t}^{0} \bigl(\theta^{*}\bigr) \bigr)^{2}\alpha (dt)+\lambda_{\epsilon}\sum _{j=1}^{p} \bigl( \bigl\vert \theta_{j}^{*}+h(\epsilon )\epsilon u_{j} \bigr\vert ^{\gamma}- \bigl\vert \theta_{j}^{*} \bigr\vert ^{\gamma}\bigr) \Biggr], \end{aligned}
where $$u=(u_{1},\ldots,u_{p})^{\prime}$$. It is easy to see that
$$\frac{1}{\epsilon h(\epsilon)}\bigl(\widehat{\theta}^{\epsilon}-\theta ^{*}\bigr)\in \mathop{\operatorname {argmin}}_{u\in\mathbb{R}^{p}}V_{\epsilon}(u).$$
(9)
A simple calculation yields
\begin{aligned}[b] V_{\epsilon}(u) &=\frac{1}{\epsilon^{2}h(\epsilon)} \int_{0}^{T} \bigl(X_{t}^{0} \bigl(\theta ^{*}+h(\epsilon)\epsilon u\bigr)-X_{t}^{0}\bigl( \theta^{*}\bigr) \bigr)^{2}\alpha(dt) \\ &\quad +\frac{\lambda_{\epsilon}}{\epsilon^{2}h(\epsilon)}\sum_{j=1}^{p} \bigl\{ \bigl\vert \theta_{j}^{*}+h(\epsilon)\epsilon u_{j} \bigr\vert ^{\gamma}- \bigl\vert \theta_{j}^{*} \bigr\vert ^{\gamma}\bigr\} \\ &\quad -\frac{2}{\epsilon^{2}h(\epsilon)} \int_{0}^{T}\bigl(X_{t}-X_{t}^{0} \bigl(\theta ^{*}\bigr)\bigr) \bigl(X_{t}^{0} \bigl(\theta^{*}+h( \epsilon)\epsilon u \bigr)-X_{t}^{0}\bigl(\theta^{*}\bigr) \bigr)\alpha(dt). \end{aligned}
(10)
In Sect. 4, we will show that $$V_{\epsilon}(u)$$ can be approached by some stochastic function.

## 3 Main result

We state our main result as follows:

### Theorem 1

Let$$h(\epsilon)=\sqrt{2\log\log(\epsilon ^{-1}\vee3)}$$. Assume that conditions (A1)(A3) hold. Then, for$$\gamma\geq1$$, the process$$(\widehat{\theta^{\epsilon}}-\theta ^{*}) /(\epsilon h(\epsilon))$$satisfies the iterated logarithm law, that is,
$$\limsup_{\epsilon\rightarrow0}\rho \biggl(\frac{\widehat{\theta ^{\epsilon}}-\theta^{*}}{\epsilon h(\epsilon)},Q^{-1}K \biggr)=0,\quad \textit{a.e.}$$
and
$$\mathbf{P} \biggl(\omega: \mathbf{C} \biggl( \biggl\{ \frac{\widehat {\theta^{\epsilon}}-\theta^{*}}{\epsilon h(\epsilon)} \biggr\} \biggr)=Q^{-1}K \biggr)=1,$$
where$$\rho(\cdot,\cdot)$$denotes the Euclidean distance, a.e. stands for almost everywhere, and
$$Q= \int_{0}^{T}\frac{\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta =\theta^{*}}\cdot \biggl(\frac{\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}} \biggr)^{\prime}\alpha(dt)$$
and
$$K= \biggl\{ \int_{0}^{T}g.\frac{\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}} \alpha(dt): g\in\mathcal{H}\textit{ and }I(g)\leq1 \biggr\} .$$
(11)
Here
\begin{aligned}& I(g):=\inf_{\phi\in\mathcal{H}:g=Y_{t}^{\phi}} \biggl\{ \frac{1}{2} \int _{0}^{T} \bigl\vert \dot{\phi}(s) \bigr\vert ^{2}\,ds \biggr\} , \\& \begin{aligned}[b] \mathcal{H}=&\biggl\{ \phi: \phi\textit{is an absolutely continuous function with }\phi(0)=0 \\ & \textit{and } \int_{0}^{T} \bigl\vert \dot{\phi}(s) \bigr\vert ^{2}\,ds< \infty\biggr\} \end{aligned} \end{aligned}
(12)
and$$Y^{\phi}$$is the solution of the integral equation given by
$$Y^{\phi}_{t}= \int_{0}^{t} \biggl[V_{x}\bigl( \theta,s,X_{s}^{0}\bigr)+ \int_{0}^{s}K_{x}\bigl(\theta ,s,u,X_{u}^{0}(\theta)\bigr)\,du \biggr]Y_{s}^{\phi}\,ds+\phi(t).$$
(13)

### Remark

For $$0<\gamma<1$$, supposing that conditions (A2)–(A3) hold and $$\lambda_{\epsilon}/\epsilon^{1-\gamma }\rightarrow\lambda_{0}\geq0$$, it can be still proved that $$(\widehat {\theta^{\epsilon}}-\theta^{*}) /(\epsilon h(\epsilon))$$ satisfies the iterated logarithm law. The proof method is the same as that of Theorem 1.

We below present an example, which is an application of the above result.

### Example

Consider the following diffusion process:
$$dX^{\epsilon}_{t}=\theta X_{t}^{\epsilon}\,dt+ \epsilon\,dW_{t},\quad X_{0}^{\epsilon}=x_{0}\neq0, 0\leq t\leq T,$$
(14)
where $$\theta\in(\kappa_{1},\kappa_{2})=\varTheta$$ and $$W_{t}$$ is a standard Wiener process, then the limit solution is
$$X_{t}^{0}=x_{0}e^{\theta t},\quad 0\leq t\leq T.$$
The minimum distance estimator $$\widehat{\theta}^{\epsilon}$$ is defined by
$$\widehat{\theta}^{\epsilon}=\mathop{\operatorname {argmin}}_{\theta\in\varTheta} \int_{0}^{T} \bigl(X_{t}^{\epsilon}-x_{0}e^{\theta t} \bigr)^{2}\alpha(dt).$$
It can be checked that the process (14) satisfies the conditions of Theorem 7.5 of [16], so $$\widehat{\theta}^{\epsilon}$$ is consistent and asymptotically normal:
$$\frac{\widehat{\theta}^{\epsilon}-\theta^{*}}{\epsilon}\Rightarrow \frac{\int_{0}^{T}te^{\theta^{*} t}Y_{t}\,dt}{x_{0}\int_{0}^{T}t^{2}e^{2\theta^{*} t}\alpha(dt)},$$
where $$Y_{t}$$ satisfies $$dY_{t}=\theta^{*} Y_{t}\,dt+dW_{t}$$, $$Y_{0}=0$$, $$0\leq t\leq T$$. It can be easily proved that $$\int_{0}^{T}te^{\theta^{*} t}Y_{t}\alpha(dt)$$ is a Gaussian process with variance
$$\sigma^{2}=\frac{1}{2\theta} \int_{0}^{T} \int_{0}^{T}ste^{2\theta ^{*}(s+t)} \bigl[1-e^{-2\theta^{*}(t\wedge s)} \bigr]\alpha(dt)\alpha(ds).$$
(15)
By virtue of Theorem 1, a simple calculation can show that the estimator $$\widehat{\theta^{\epsilon}}$$ has the following limit behavior:
$$\limsup_{\epsilon\rightarrow0}\frac{\widehat{\theta^{\epsilon}}-\theta^{*}}{\epsilon\sqrt{2\log\log(\epsilon^{-1}\vee3)}}=\frac {\sigma}{x_{0}\int_{0}^{T}t^{2}e^{2\theta^{*} t}\alpha(dt)},\quad \text{a.e.}$$
where $$\sigma^{2}$$ is defined in (15).

## 4 Proofs

In order to prove Theorem 1, we need the following lemmas. Lemma 2 is about the approach of argmins, which is crucial to study the asymptotic theory for some argmin processes of parametrized convex objective functions (see [10] and [14] for other applications).

### Lemma 1

Suppose that$$A^{\epsilon}(u)$$and$$B^{\epsilon}(u)$$, $$u\in\mathbb{R}^{p}$$, are two convex bounded functions. Assume that$$\lim_{\epsilon\rightarrow0}h(\epsilon)=\infty$$and for any$$u\in \mathbb{R}^{p}$$, $$\delta>0$$,
$$\limsup_{\epsilon\rightarrow0}\frac{1}{h^{2}(\epsilon)}\log\mathbf {P} \bigl( \bigl\vert A^{\epsilon}(u)-B^{\epsilon}(u) \bigr\vert \geq\delta \bigr)=- \infty.$$
(16)
Then for any compact set$$D\subset\mathbb{R}^{p}$$,
$$\limsup_{\epsilon\rightarrow0}\frac{1}{h^{2}(\epsilon)}\log\mathbf {P} \Bigl(\sup _{u\in D } \bigl\vert A^{\epsilon}(u)-B^{\epsilon}(u) \bigr\vert \geq \epsilon \Bigr)=-\infty.$$
(17)

### Proof

The approach stems from Lemma 3 of Kato [13]. For the sake of completeness, we simply state the proof. By the the convexity and boundedness of $$A^{\epsilon}(u)$$, there exists a constant $$\beta_{1}>0$$ satisfying
$$\bigl\vert A^{\epsilon}(u)-A^{\epsilon}(v) \bigr\vert \leq \beta_{1} \vert u-v \vert ,\quad \text{for any } u, v\in D.$$
(18)
Similarly, there exists another constant $$\beta_{2}>0$$ satisfying
$$\bigl\vert B^{\epsilon}(u)-B^{\epsilon}(v) \bigr\vert \leq \beta_{2} \vert u-v \vert ,\quad \text{for any } u, v\in D.$$
(19)
Let $$\beta_{0}=\max\{\beta_{1},\beta_{2}\}$$. For any $$\epsilon>0$$, there exists a finite set $$D_{1}\subset D$$ such that each point of D lies within the distance $$\frac{\epsilon}{3\beta_{0}}$$ of at least one point of $$D_{1}$$. Equation (16) implies that
$$\limsup_{\epsilon\rightarrow0}\frac{1}{h^{2}(\epsilon)}\log\mathbf {P} \Bigl(\sup _{v\in D_{1}} \bigl\vert A^{\epsilon}(v)-B^{\epsilon}(v) \bigr\vert \geq \epsilon \Bigr)=-\infty.$$
(20)
Giving any $$u\in K$$, let v be a point of $$D_{1}$$ such that $$|u-v|\leq \frac{\epsilon}{3\beta_{0}}$$. Then
\begin{aligned} \bigl\vert A^{\epsilon}(u)-B^{\epsilon}(u) \bigr\vert &\leq \bigl\vert A^{\epsilon}(u)-A^{\epsilon}(v) \bigr\vert + \bigl\vert A^{\epsilon}(v)-B^{\epsilon}(v) \bigr\vert + \bigl\vert B^{\epsilon}(v)-B^{\epsilon}(u) \bigr\vert \\ &\leq\beta_{1} \vert u-v \vert + \bigl\vert A^{\epsilon}(v)-B^{\epsilon}(v) \bigr\vert +\beta_{2} \vert v-u \vert \\ &\leq\frac{2\epsilon}{3}+ \bigl\vert A^{\epsilon}(v)-B^{\epsilon}(v) \bigr\vert . \end{aligned}
Consequently,
$$\mathbf{P} \bigl( \bigl\vert A^{\epsilon}(u)-B^{\epsilon}(u) \bigr\vert \geq\epsilon \bigr)\leq\mathbf{P} \biggl( \bigl\vert A^{\epsilon}(v)-B^{\epsilon}(v) \bigr\vert \geq\frac {\epsilon}{3} \biggr).$$
By virtue of (20), the desired result is obtained. □

### Lemma 2

Suppose that$$A^{\epsilon}(u)$$and$$B^{\epsilon}(u)$$are two suitable families of convex random functions defined on a compact set$$\mathcal{S}\in \mathbb{R}^{p}$$, where$$\epsilon\in(0,1]$$is the index parameter. Let$$a^{\epsilon}$$be the argmin of$$A^{\epsilon}(u)$$and assume that$$B^{\epsilon}(u)$$has a unique argmin$$b^{\epsilon}$$. Then for any positive constantδ,
$$\mathbf{P} \bigl( \bigl\vert a^{\epsilon}-b^{\epsilon}\bigr\vert > \delta \bigr)\leq\mathbf {P} \biggl(\widetilde{\triangle}_{\epsilon}\geq \frac{\eta}{2} \biggr),$$
(21)
where$$\widetilde{\triangle}_{\epsilon}=\sup_{u\in\{u:|u-b^{\epsilon}|\leq\delta\}} |A^{\epsilon}(u)-B^{\epsilon}(u) |$$and
$$\eta=\inf_{v\in\mathbb{S}^{p-1}} \bigl\{ B^{\epsilon}\bigl(b^{\epsilon}+\delta v\bigr)-B^{\epsilon}\bigl(b^{\epsilon}\bigr) \bigr\} .$$
(22)

### Proof

Let $$\mathbb{S}^{p-1}=\{x\in\mathbb{R}^{p}:|x|=1\}$$. For any $$v\in\mathbb{S}^{p-1}$$, the convexity of $$A^{\epsilon}(u)$$ yields
$$\biggl(1-\frac{\delta}{l}\biggr)A^{\epsilon}\bigl(b^{\epsilon}\bigr)+ \frac{\delta }{l}A^{\epsilon}\bigl(b^{\epsilon}+lv\bigr)\geq A^{\epsilon}\bigl(b^{\epsilon}+\delta v\bigr), \quad\forall l>\delta.$$
It is equivalent to
$$\frac{\delta}{l} \bigl(A^{\epsilon}\bigl(b^{\epsilon}+lv \bigr)-A^{\epsilon}\bigl(b^{\epsilon}\bigr) \bigr)\geq A^{\epsilon}\bigl(b^{\epsilon}+\delta v\bigr)-A^{\epsilon}\bigl(b^{\epsilon}\bigr).$$
Let $$\triangle_{\epsilon}(u)=A^{\epsilon}(u)-B^{\epsilon}(u)$$. We have
\begin{aligned}[b] &\frac{\delta}{l} \bigl(A^{\epsilon}\bigl(b^{\epsilon}+lv\bigr)-A^{\epsilon}\bigl(b^{\epsilon}\bigr) \bigr) \\ &\quad\geq B^{\epsilon}\bigl(b^{\epsilon}+\delta v\bigr)-B^{\epsilon}\bigl(b^{\epsilon}\bigr) + \bigl(\triangle_{\epsilon}\bigl(b^{\epsilon}+ \delta v\bigr)-\triangle_{\epsilon}\bigl(b^{\epsilon}\bigr) \bigr) \\ &\quad\geq\eta-2\widetilde{\triangle}_{\epsilon}. \end{aligned}
(23)
Since $$\mathcal{S}$$ is a compact set and $$b^{\epsilon}$$ is the unique argmin point of $$B^{\epsilon}$$, so η is a positive random variable. If $$\widetilde{\triangle}_{\epsilon}<\frac{\eta}{2}$$, then $$A^{\epsilon}(b^{\epsilon}+lv)-A^{\epsilon}(b^{\epsilon})>0$$ for each v. This implies that if $$|a^{\epsilon}-b^{\epsilon}|>\delta$$, then $$A^{\epsilon}(a^{\epsilon})-A^{\epsilon}(b^{\epsilon})>0$$. The minimum property of $$a^{\epsilon}$$ will lead to a contradiction. Thus, for any positive constant δ,
$$\mathbf{P} \bigl( \bigl\vert a^{\epsilon}-b^{\epsilon}\bigr\vert > \delta \bigr)\leq\mathbf {P} \biggl(\widetilde{\triangle}_{\epsilon}\geq \frac{\eta}{2} \biggr).$$
The proof is completed. □

### Lemma 3

Assume that$$A^{\epsilon}(u)$$is a convex random function defined in an open set$$\mathcal{S}\in\mathbb{R}^{p}$$. Let$$B^{\epsilon}(u)=-u^{T}U^{\epsilon}+\frac{1}{2}u^{T}Qu$$, whereQis a symmetric and positive define$$p\times p$$matrix and$$U^{\epsilon}$$is stochastically bounded. Furthermore, let$$1\leq h(\epsilon)=o(1/\sqrt{\epsilon})$$, we assume the following three conditions hold:
1. (i)
Random process$$U^{\epsilon}$$satisfies the iterated logarithm law, that is, there exists a fixed bounded symmetric setKin$$\mathbb {R}^{p}$$such that
$$\limsup_{\epsilon\rightarrow0}\rho \biggl(\frac{U^{\epsilon}}{\sqrt {2\epsilon\log\log(\epsilon^{-1}\wedge3))}},K \biggr)=0, \quad\textit{a.e.}$$
and
$$\mathbf{P} \biggl(\omega: \mathbf{C} \biggl( \biggl\{ \frac{U^{\epsilon}}{\sqrt{2\epsilon\log\log(\epsilon^{-1}\wedge3))}} \biggr\} \biggr)=K \biggr)=1.$$

2. (ii)
For any$$R>0$$and any$$\delta>0$$, there exists an$$\epsilon_{0}>0$$such that for all$$\epsilon\in(0,\epsilon_{0}]$$,
$$\mathbf{P} \bigl( \bigl\vert A^{\epsilon}(u)-B^{\epsilon}(u) \bigr\vert \geq\delta h(\epsilon) \bigr)\leq e^{-Rh^{2}(\epsilon)}.$$
(24)
Then, $$a^{\epsilon}$$, the minimizer of convex process$$A^{\epsilon}(u)$$, satisfies the iterated logarithm law, that is,
$$\limsup_{\epsilon\rightarrow0}\rho \biggl(\frac{a^{\epsilon}}{\sqrt {2\epsilon\log\log(\epsilon^{-1}\wedge3))}},Q^{-1}K \biggr)=0, \quad\textit{a.e.}$$
and
$$\mathbf{P} \biggl(\omega: \mathbf{C} \biggl( \biggl\{ \frac{a^{\epsilon}}{\sqrt{2\epsilon\log\log(\epsilon^{-1}\wedge3))}} \biggr\} \biggr)=Q^{-1}K \biggr)=1,$$
where$$Q^{-1}K=\{Q^{-1}x: x\in K\}$$.

### Proof

Let $$b^{\epsilon}=Q^{-1}U^{\epsilon}$$. It is easy to see that $$b^{\epsilon}$$ is the unique minimum point of $$B^{\epsilon}(u)$$. The continuous mapping theorem in the iterated logarithm law yields
$$\limsup_{\epsilon\rightarrow0}\rho \biggl(\frac{b^{\epsilon}}{\sqrt {2\epsilon\log\log(\epsilon^{-1}\wedge3))}},Q^{-1}K \biggr)=0, \quad\text{a.e.}$$
and
$$\mathbf{P} \biggl(\omega: \mathbf{C} \biggl( \biggl\{ \frac{b^{\epsilon}}{\sqrt{2\epsilon\log\log(\epsilon^{-1}\wedge3))}} \biggr\} \biggr)=Q^{-1}K \biggr)=1.$$
Then a simple calculation shows that
\begin{aligned}[b] B^{\epsilon}(u)-B^{\epsilon}\bigl(b^{\epsilon}\bigr)&=\frac{1}{2}\bigl(u-b^{\epsilon}\bigr)^{\prime}Q\bigl(u-b^{\epsilon}\bigr) \\ &\geq c \bigl\vert u-b^{\epsilon}\bigr\vert ^{2}, \end{aligned}
(25)
where $$c>0$$ is the smallest eigenvalue of Q.
As in the proof of Lemma 2, note that by the definition of η, we further have
$$\mathbf{P} \bigl( \bigl\vert a^{\epsilon}-b^{\epsilon}\bigr\vert > \delta \bigr)\leq\mathbf {P} \biggl(\widetilde{\triangle}_{n}\geq \frac{c\delta^{2}}{2} \biggr).$$
(26)
From (25) and (26), combining condition (ii), we have
$$\mathbf{P} \bigl( \bigl\vert a^{\epsilon}-b^{\epsilon}\bigr\vert \geq\delta h(\epsilon) \bigr)\leq e^{-Rh^{2}(\epsilon)}.$$
This implies that $$a^{\epsilon}$$, the minimizer of convex process $$A^{\epsilon}(u)$$, satisfies the iterated logarithm law. □

### Lemma 4

Assume$$X_{t}^{\epsilon}$$is the solution of the following stochastic differential equation:
$$dX_{t}^{\epsilon}=b\bigl(X_{t}^{\epsilon}\bigr)\,dt+ \sqrt{\epsilon}\sigma \bigl(X_{t}^{\epsilon}\bigr) \,dB_{t},\quad X_{0}^{\epsilon}=x_{0},$$
and that$$X^{0}_{t}$$is the solution of the ordinary differential equations
$$dX_{t}^{0}=b\bigl(X_{t}^{0}\bigr)\,dt,\quad X_{t}^{0}=x_{0}.$$
Assume that$$b(\cdot)$$and$$\sigma(\cdot)$$are Lipschitz continuous on every compact subset of$$\mathbb{R}$$and there exists a positive constantL, for any$$x,y\in\mathbb{R}^{+}$$, satisfying
$$xb(x)\leq L\bigl(1+ \vert x \vert ^{2}\bigr),\qquad \bigl\vert \sigma(x)-\sigma(y) \bigr\vert \leq L \vert x-y \vert ^{2}.$$
Then, the process$$\frac{X_{t}^{\epsilon}-X^{0}_{t}}{\sqrt{2\epsilon\log \log(\epsilon^{-1}\vee3)}}$$satisfies the iterated logarithm law, that is,
$$\limsup_{\epsilon\rightarrow0}\rho \biggl(\frac{X_{t}^{\epsilon}-X^{0}_{t}}{\sqrt{2\epsilon\log\log(\epsilon^{-1}\vee3)}},K \biggr)=0,\quad \textit{a.e.}$$
and
$$\mathbf{P} \biggl(\omega: \mathbf{C} \biggl( \biggl\{ \frac{X_{t}^{\epsilon}-X^{0}_{t}}{\sqrt{2\epsilon\log\log(\epsilon^{-1}\vee3)}} \biggr\} \biggr)=K \biggr)=1.$$
Here, $$K= \{g: g\in\mathcal{H} \textit{ and } I(g)\leq1 \}$$and
$$I(g):=\inf_{\phi\in\mathcal{H}:g=Y_{t}^{\phi}} \biggl\{ \frac{1}{2} \int _{0}^{T} \bigl\vert \dot{\phi}(s) \bigr\vert ^{2}\,ds \biggr\} ,$$
where$$\mathcal{H}$$is defined in (12) and$$Y^{\phi}$$is the solution of the integral equation given by
$$Y^{\phi}_{t}= \int_{0}^{t}\dot{b}\bigl(X_{s}^{0} \bigr)Y_{s}^{\phi}\,ds+ \int_{0}^{t}\sigma \bigl(X^{0}_{s} \bigr)\dot{\phi}(s)\,ds.$$

### Proof

See Theorem 2.2 of [3] and Proposition 3.2 of [15], or Theorem 3.1 of [6]. □

### Lemma 5

Under conditions (A2) and (A3), the deterministic dynamical system$$X_{t}^{0}(\theta)$$is differentiable with respect toθat the point$$\theta^{*}$$in$$L_{2}$$-norm, that is,
$$\int_{0}^{T} \biggl(X_{t}^{0} \bigl(\theta^{*}+h\bigr)-X_{t}^{0}\bigl(\theta^{*} \bigr)-h^{\prime}\frac {\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}} \biggr)^{2} \alpha(dt)=o\bigl( \vert h \vert \bigr).$$

### Proof

From Eq. (2), we have
\begin{aligned}[b] X_{t}^{0}(\theta)&= \int_{0}^{t}S_{s}\bigl( \theta,X^{0}(\theta)\bigr)\,ds \\ &= \int_{0}^{t} \biggl[V \bigl(\theta,s,X^{0}( \theta) \bigr)+ \int_{0}^{s}K \bigl(\theta,s,u,X_{u}^{0}( \theta) \bigr)\,du \biggr]\,ds. \end{aligned}
(27)
So
$$\begin{gathered} X_{t}^{0}\bigl(\theta^{*}+h \bigr)-X_{t}^{0}\bigl(\theta^{*}\bigr) \\ \quad= \int_{0}^{t}S_{s}\bigl( \theta^{*}+h,X^{0}\bigl(\theta^{*}+h\bigr)\bigr)\,ds- \int_{0}^{t}S_{s}\bigl(\theta ^{*},X^{0}\bigl(\theta^{*}\bigr)\bigr)\,ds \\ \quad= \int_{0}^{t} \bigl[V\bigl(\theta^{*}+h,s,X^{0} \bigl(\theta^{*}+h\bigr)\bigr)-V\bigl(\theta ^{*},s,X^{0}\bigl(\theta^{*}+h \bigr)\bigr) \bigr]\,ds \\ \qquad{} + \int_{0}^{t} \bigl[V\bigl(\theta^{*},s,X^{0} \bigl(\theta^{*}+h\bigr)\bigr)-V\bigl(\theta ^{*},s,X^{0}\bigl(\theta^{*} \bigr)\bigr) \bigr]\,ds \\ \qquad{} + \int_{0}^{t} \biggl[ \int_{0}^{s}K\bigl(\theta^{*}+h,s,u,X_{u}^{0} \bigl(\theta^{*}+h\bigr)\bigr)\,du- \int _{0}^{s}K\bigl(\theta^{*},s,u,X_{u}^{0} \bigl(\theta^{*}+h\bigr)\bigr)\,du \biggr]\,ds \\ \qquad{} + \int_{0}^{t} \biggl[ \int_{0}^{s}K\bigl(\theta^{*},s,u,X^{0}_{u} \bigl(\theta^{*}+h\bigr)\bigr)\,du- \int _{0}^{s}K\bigl(\theta^{*},s,u,X_{u}^{0} \bigl(\theta^{*}\bigr)\bigr)\,du \biggr]\,ds. \end{gathered}$$
Applying conditions (A2), (A3) and Gronwall’s inequality, we have
\begin{aligned} \sup_{0\leq s\leq T} \int^{s}_{0} \bigl(X_{u}^{0} \bigl(\theta^{*}+h\bigr)-X_{u}^{0}\bigl(\theta ^{*}\bigr) \bigr)^{2}\alpha(du) \leq\mathcal{M}T \vert h \vert e^{T}, \end{aligned}
(28)
where $$\mathcal{M}$$ is some positive constant. From (27), we also get that
$$\frac{\partial X_{t}^{0}}{\partial\theta}= \int_{0}^{t}\frac{\partial S_{s}(\theta,X^{0})}{\partial\theta}\,ds+ \int_{0}^{t}\frac{\partial S_{s}(\theta,X^{0})}{\partial x}.\frac{\partial X^{0}_{s}}{\partial\theta}\,ds$$
and
\begin{aligned}[b] &X_{t}^{0}\bigl(\theta^{*}+h \bigr)-X_{t}^{0}\bigl(\theta^{*}\bigr)-h^{\prime} \frac{\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}} \\ &\quad= \int_{0}^{t} \biggl[S_{s}\bigl( \theta^{*}+h,X^{0}\bigl(\theta^{*}+h\bigr)\bigr)-S_{s}\bigl(\theta ^{*}+h,X^{0}\bigl(\theta^{*}\bigr)\bigr) \\ &\qquad -h^{\prime}\frac{\partial X_{s}^{0}}{\partial\theta}.\frac{\partial S_{s}(\theta,X^{0})}{\partial x}\bigg|_{\theta=\theta^{*}} \biggr]\,ds \\ &\qquad + \int_{0}^{t} \biggl[S_{s}\bigl( \theta^{*}+h,X^{0}\bigl(\theta^{*}\bigr)\bigr)-S_{s}\bigl(\theta ^{*},X^{0}\bigl(\theta^{*}\bigr)\bigr)-h^{\prime}\frac{\partial S_{s}(\theta ,X^{0})}{\partial\theta}\bigg|_{\theta=\theta^{*}} \biggr]\,ds \\ &\quad=\mathbf{I}_{1}+\mathbf{I}_{2}. \end{aligned}
(29)
Taylor’s expansion and (A3) imply that $$|\mathbf{I}_{2}|\leq\mathcal {M}|h|^{2}T$$.
For $$\mathbf{I}_{1}$$, let
$$A_{s}(\theta)= \int_{0}^{1}\frac{\partial}{\partial x}S_{s} \bigl( \theta +h,X_{s}^{0}(\theta)+u\bigl(X_{s}^{0}( \theta+h)-X_{s}^{0}(\theta)\bigr) \bigr)\,du,$$
then
\begin{aligned}& \int_{0}^{T}(\mathbf{I}_{1})^{2} \alpha(dt) \\& \quad= \int_{0}^{T} \biggl( \int_{0}^{t} \biggl[A_{s}\bigl(\theta^{*} \bigr) \bigl(X^{0}_{s}\bigl(\theta ^{*}+h\bigr)-X^{0}_{s} \bigl(\theta^{*}\bigr)\bigr)-h^{\prime}\frac{\partial X_{s}^{0}}{\partial \theta}\frac{\partial S_{s}(\theta,X^{0})}{\partial x}\bigg|_{\theta =\theta^{*}} \biggr]\,ds \biggr)^{2}\alpha(dt) \\& \quad\leq \int_{0}^{T} \biggl[ \int_{0}^{t}A_{s}\bigl(\theta^{*}\bigr) \biggl( \bigl(X_{s}^{0}\bigl(\theta ^{*}+h\bigr)-X_{s}^{0} \bigl(\theta^{*}\bigr) \bigr)-h^{\prime}\frac{\partial X_{s}^{0}}{\partial \theta} \biggr)\,ds \biggr]^{2}\alpha(dt) \\& \qquad{} + \int_{0}^{T} \biggl[ \int_{0}^{t} \biggl(A_{s}\bigl(\theta^{*} \bigr)-\frac{\partial S_{s}(\theta+h,X_{s}^{0})}{\partial x}\bigg|_{\theta=\theta^{*}} \biggr)h^{\prime} \frac{\partial X_{s}^{0}}{\partial\theta}\,ds \biggr]^{2}\alpha (dt) \\& \qquad{} + \int_{0}^{T} \biggl[ \int_{0}^{t} \biggl(\frac{\partial S_{s}(\theta +h,X_{s}^{0})}{\partial x}\bigg|_{\theta=\theta^{*}}- \frac{\partial S_{s}(\theta,X_{s}^{0})}{\partial x}\bigg|_{\theta=\theta^{*}} \biggr)h^{\prime}\frac{\partial X_{s}^{0}}{\partial\theta}\,ds \biggr]^{2}\alpha (dt) \\& \quad=\mathbf{I}_{3}+\mathbf{I}_{4}+\mathbf{I}_{5}. \end{aligned}
Conditions (A2) and (A3) yield that function $$A_{s}(\theta)$$ is bounded for any $$s\in[0,T]$$. By the differentiability of function $$S_{t}(\theta,x)$$ at x for every fixed θ, we get $$\mathbf{I}_{4}=o(|h|)$$ and $$\mathbf{I}_{5}=o(|h|)$$.
On the other hand, we also have
$$\mathbf{I}_{3}\leq\mathcal{M} \int^{T}_{0} \biggl( \int^{t}_{0} \biggl[X_{s}^{0} \bigl(\theta^{*}+h\bigr)-X_{s}^{0}\bigl(\theta^{*} \bigr)-h^{\prime}\frac{\partial X_{s}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}} \biggr]\,ds \biggr)^{2}\alpha(dt).$$
By applying Gronwall’s inequality and combining (28) and (29), we get
$$\int_{0}^{T} \biggl(X_{t}^{0} \bigl(\theta^{*}+h\bigr)-X_{t}^{0}\bigl(\theta^{*} \bigr)-h^{\prime}\frac {\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}} \biggr)^{2} \alpha(dt)=o\bigl( \vert h \vert \bigr).$$
The proof is completed. □

### Proof of Theorem 1

Noting that
$$\frac{\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}}= \biggl(\frac{\partial X_{t}^{0}}{\partial\theta_{1}},\ldots, \frac{\partial X_{t}^{0}}{\partial\theta_{p}} \biggr)^{T}\bigg|_{\theta=\theta^{*}}$$
and $$h(\epsilon)=\sqrt{2\log\log(\epsilon^{-1}\vee3)}$$, we define
\begin{aligned}[b] G_{\epsilon}(u)&=h(\epsilon) \biggl[u^{\prime} \int_{0}^{T}\frac{\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}}\cdot \biggl(\frac {\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}} \biggr)^{\prime}\alpha(dt)\cdot u \\ &\quad -2u^{\prime} \int_{0}^{T}\frac{1}{\epsilon h(\epsilon)}\bigl(X_{t}^{\epsilon}-X_{t}^{0}\bigl(\theta^{*}\bigr)\bigr)\frac{\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}} \alpha(dt) \biggr]. \end{aligned}
(30)
A simple calculation yields
\begin{aligned}[b] \frac{ \vert V_{\epsilon}(u)-G_{\epsilon}(u) \vert }{h(\epsilon)} &= \int_{0}^{T} \biggl[ \biggl(\frac{X_{t}^{0}(\theta^{*}+h(\epsilon)\epsilon u)-X_{t}^{0}(\theta^{*})}{\epsilon h(\epsilon)} \biggr)^{2}- \biggl(u^{\prime }\frac{\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}} \biggr)^{2} \biggr]\alpha(dt) \\ &\quad +\frac{\lambda_{\epsilon}}{\epsilon^{2}h^{2}(\epsilon)}\sum_{j=1}^{p} \bigl\{ \bigl\vert \theta_{j}^{*}+h(\epsilon)\epsilon u_{j} \bigr\vert ^{\gamma}- \bigl\vert \theta _{j}^{*} \bigr\vert ^{\gamma}\bigr\} -2 \int_{0}^{T}\frac{X_{t}^{\epsilon}-X_{t}^{0}(\theta ^{*})}{\epsilon h(\epsilon)} \\ & \quad\cdot \biggl(\frac{X_{t}^{0}(\theta^{*}+h(\epsilon)\epsilon u)-X_{t}^{0}(\theta^{*})}{\epsilon h(\epsilon)}-u^{\prime}\frac{\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}} \biggr)\alpha(dt) \\ &:=\mathbf{J}_{1}+\mathbf{J}_{2}-2\mathbf{J}_{3}. \end{aligned}
(31)
For the case of $$\gamma>1$$, noting that $$\epsilon^{-1}\lambda _{\epsilon}\rightarrow\lambda_{0}$$ as $$\epsilon\rightarrow0$$, we have
\begin{aligned} \mathbf{J}_{2}=\frac{\lambda_{\epsilon}}{\epsilon h(\epsilon)}\sum _{j=1}^{p}u_{j}\frac{ \vert \theta_{j}^{*}+h(\epsilon)\epsilon u_{j} \vert ^{\gamma}- \vert \theta_{j}^{*} \vert ^{\gamma}}{ h(\epsilon)\epsilon u_{j}} \sim\frac{\lambda_{0}}{ h(\epsilon)}\sum_{j=1}^{p}u_{j} \operatorname {sgn}\bigl(\theta _{j}^{*}\bigr) \bigl\vert \theta_{j}^{*} \bigr\vert ^{\gamma-1}\rightarrow0, \end{aligned}
where $$\operatorname {sgn}(x)=1$$ for $$x>0$$; $$\operatorname {sgn}(x)=-1$$ for $$x<0$$ and $$\operatorname {sgn}(0)=0$$.
For the case of $$\gamma=1$$,
\begin{aligned} \mathbf{J}_{2}&=\frac{\lambda_{\epsilon}}{\epsilon^{2} h^{2}(\epsilon )}\sum _{j=1}^{p}\bigl\{ \bigl\vert \theta_{j}^{*}+\epsilon h(\epsilon) u_{j} \bigr\vert - \bigl\vert \theta_{j}^{*} \bigr\vert \bigr\} \\ &\sim\frac{\lambda_{0}}{ h(\epsilon)}\sum_{j=1}^{p} \bigl( \vert u_{j} \vert \mathcal {I}\bigl\{ \theta_{j}^{*}=0 \bigr\} +u_{j}\operatorname {sgn}\bigl(\theta_{j}^{*}\bigr) \bigl\vert \theta_{j}^{*} \bigr\vert \mathcal{I}\bigl\{ \theta_{j}^{*} \neq0\bigr\} \bigr) \\ &\rightarrow0, \end{aligned}
where $$\mathcal{I}\{\cdot\}$$ denotes the indicator function. From Lemma 5, it implies that $$\mathbf{J}_{1}\rightarrow0$$. Thus for any $$\delta>0$$ and sufficiently small positive ϵ, we have
$$\mathbf{P} \bigl( \bigl\vert V_{\epsilon}(u)-G_{\epsilon}(u) \bigr\vert \geq h(\epsilon)\delta \bigr)\leq\mathbf{P} \biggl( \vert \mathbf{J}_{3} \vert \geq\frac{\delta}{4} \biggr).$$
(32)
In addition, Lemma 5 also implies that for any $$\kappa>0$$, there exists a positive constant $$\epsilon_{0}$$ such that, when $$\epsilon<\epsilon_{0}$$,
$$\biggl\vert \frac{X_{t}^{0}(\theta^{*}+h(\epsilon)\epsilon u)-X_{t}^{0}(\theta ^{*})}{\epsilon h(\epsilon)}-u^{\prime}\frac{\partial X_{t}^{0}}{\partial \theta}\bigg|_{\theta=\theta^{*}} \biggr\vert \leq\kappa.$$
By the Cauchy–Schwartz inequality, we have
\begin{aligned} \vert \mathbf{J}_{3} \vert ^{2}&\leq \int_{0}^{T} \biggl(\frac{X_{t}^{\epsilon}-X_{t}^{0}(\theta ^{*})}{\epsilon h(\epsilon)} \biggr)^{2}\alpha(dt) \\ & \quad\cdot \int_{0}^{T} \biggl(\frac{X_{t}^{0}(\theta^{*}+h(\epsilon)\epsilon u)-X_{t}^{0}(\theta^{*})}{\epsilon h(\epsilon)}-u^{\prime} \frac{\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}} \biggr)^{2}\alpha(dt) \\ &\leq T\kappa^{2} \int_{0}^{T} \biggl(\frac{X_{t}^{\epsilon}-X_{t}^{0}(\theta ^{*})}{\epsilon h(\epsilon)} \biggr)^{2}\alpha(dt). \end{aligned}
Thus, for sufficiently small $$\delta>0$$,
$$\mathbf{P} \biggl( \vert \mathbf{J}_{3} \vert \geq \frac{\delta}{4} \biggr)\leq \mathbf{P} \biggl( \biggl( \int_{0}^{T} \biggl(\frac{X_{t}^{\epsilon}-X_{t}^{0}(\theta^{*})}{\epsilon h(\epsilon)} \biggr)^{2}\,dt \biggr)^{\frac {1}{2}}\geq\frac{\delta}{4\sqrt{T}\kappa} \biggr).$$
(33)
One can easily check that (A2)–(A3) imply Assumption (A) of [6]. By Lemma 4, the stochastic process $$(X_{t}^{\epsilon}-X_{t}^{0}(\theta^{*}))/\epsilon h(\epsilon)$$ satisfies the law of iterated logarithm on $$C([0,T];\mathbb{R})$$ with the rate function $$I(\cdot)$$, that is,
$$\limsup_{\epsilon\rightarrow0}\rho \biggl(\frac{X_{t}^{\epsilon}-X^{0}_{t}(\theta^{*})}{\sqrt{2\epsilon\log\log(\epsilon^{-1}\vee 3)}},K_{1} \biggr)=0,\quad \text{a.e.}$$
and
$$\mathbf{P} \biggl(\omega: \mathbf{C} \biggl( \biggl\{ \frac{X_{t}^{\epsilon}-X^{0}_{t}}{\sqrt{2\epsilon\log\log(\epsilon^{-1}\vee3)}} \biggr\} \biggr)=K_{1} \biggr)=1,$$
where $$K_{1}= \{g: I(g)\leq\frac{1}{2} \}$$ and
$$I(g):=\inf_{\phi\in\mathcal{H}:g=Y_{t}^{\phi}} \biggl\{ \frac{1}{2} \int _{0}^{T} \bigl\vert \dot{\phi}(s) \bigr\vert ^{2}\,ds \biggr\} .$$
Here $$\mathcal{H}$$ is defined in (12) and $$Y^{\phi}$$ in (13).
The invariance principle (see Theorem 4.3 of [9]) yields that the stochastic process
$$\int_{0}^{T}\frac{1}{\epsilon h(\epsilon)}\bigl(X_{t}^{\epsilon}-X_{t}^{0} \bigl(\theta ^{*}\bigr)\bigr)\frac{\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta ^{*}}\alpha(dt)$$
satisfies the law of the iterated logarithm with rate function $$I^{*}(\cdot)$$, that is,
$$\limsup_{\epsilon\rightarrow0}\rho \biggl( \int_{0}^{T}\frac{1}{\epsilon h(\epsilon)}\bigl(X_{t}^{\epsilon}-X_{t}^{0} \bigl(\theta^{*}\bigr)\bigr)\frac{\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}}\alpha(dt),K \biggr)=0,\quad \text{a.e.}$$
(34)
and
$$\mathbf{P} \biggl(\omega: \mathbf{C} \biggl( \biggl\{ \int_{0}^{T}\frac {1}{\epsilon h(\epsilon)}\bigl(X_{t}^{\epsilon}-X_{t}^{0} \bigl(\theta^{*}\bigr)\bigr)\frac {\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}}\alpha (dt) \biggr\} \biggr)=K \biggr)=1,$$
(35)
where K is defined in (11) and
$$I^{*}(x)=\inf_{g\in\mathcal{H}} \biggl\{ I(g): x= \int_{0}^{T}g\frac {\partial X_{t}^{0}}{\partial\theta}\bigg|_{\theta=\theta^{*}}\alpha (dt) \biggr\} .$$
On the other hand, letting $$\kappa\rightarrow0$$ in (33), we have
$$\limsup_{\epsilon\rightarrow0}\frac{1}{h^{2}(\epsilon)}\log\mathbf {P} \biggl( \biggl( \int_{0}^{T} \biggl(\frac{X_{t}^{\epsilon}-X_{t}^{0}(\theta ^{*})}{\epsilon h(\epsilon)} \biggr)^{2}\alpha(dt) \biggr)^{\frac{1}{2}}\geq \frac{\delta}{4\sqrt{T}} \biggr)=-\infty.$$
Hence, for any $$\delta>0$$ and $$R>0$$, there exists an $$\epsilon_{0}>0$$ such that for all $$\epsilon\in(0,\epsilon_{0}]$$,
$$\mathbf{P} \bigl( \bigl\vert V_{\epsilon}(u)-G_{\epsilon}(u) \bigr\vert \geq h(\epsilon)\delta \bigr)\leq e^{Rh^{2}(\epsilon)}.$$
(36)
Combining (30) and (34)–(36), by applying Lemma 3, we see that the process $$(\widehat{\theta}^{\epsilon}-\theta^{*})/ \epsilon h(\epsilon)$$ satisfies the law of the iterated logarithm. The proof is completed. □

## 5 Conclusion

In this paper, we discussed the convergence rate on the estimators $$\widehat{\theta}^{\epsilon}$$ converging to the true value. A simple example was given to test the feasibility of this result. Due to the complexity and minimization of the convex process, we did not conduct an extensive simulation study to identify these stochastic diffusion processes and illustrate the finite performance of the proposed method. Recently, we got to know that some novel approaches of modeling such as the accurate discretization method [12], two coupled pendulums methods [4], fractional stochastic modeling [2], and fractional discretization [1] were introduced. Those methods can be helpful when dealing with our simulation at some point. This will become an important research direction for us in the future.

## Notes

### Acknowledgements

The authors thank three anonymous reviewers for their valuable comments and suggestions in improving the paper.

### Authors’ contributions

All authors contributed equally to the manuscript and typed, read and approved the final manuscript.

### Funding

This work is supported by the National Natural Science Foundation of China under NSFC grant (No. 11571326).

### Competing interests

The authors declare that they have no competing interests.

## References

1. 1.
Atangana, A.: Fractional discretization: the African’s tortoise walk. Chaos Solitons Fractals 130, Article ID 109399 (2020)
2. 2.
Atangana, A., Bonyah, E.: Fractional stochastic modeling: new approach to capture more heterogeneity. Chaos, Interdiscip. J. Nonlinear Sci. 29, Article ID 013118 (2019)
3. 3.
Baldi, P.: Large deviations and functional iterated logarithm law for diffusion processes. Probab. Theory Relat. Fields 71, 435–453 (1986)
4. 4.
Baleanu, D., Jajarmi, A., Asad, J.H.: Classical and fractional aspects of two coupled pendulums. Rom. Rep. Phys. 71(1), Article ID 103 (2019) Google Scholar
5. 5.
Bressloff, P.C.: Stochastic Processes in Cell Biology. Interdisciplinary Applied Mathematics, vol. 41. Springer, New York (2014)
6. 6.
Caramellino, L.: Strassen’s law of the iterated logarithm for diffusion processes for small time. Stoch. Process. Appl. 74(1), 1–19 (1998)
7. 7.
Dietz, H.M., Kutoyants, Y.A.: A class of minimum-distance estimators for diffusion processes with ergodic properties. Stat. Risk. Model. 15(3), 211–228 (1997)
8. 8.
Freidlin, M.I., Szücs, J., Wentzell, A.D.: Random Perturbations of Dynamical Systems. Grundlehren der mathematischen Wissenschaften, vol. 260. Springer, New York (2012)
9. 9.
Gao, F.G., Wang, S.C.: Asymptotic behaviors for functionals of random dynamical systems. Stoch. Anal. Appl. 34, 258–277 (2015)
10. 10.
Geyer, C.J.: On the asymptotics of convex stochastic optimization. Unpublished manuscript (1996) Google Scholar
11. 11.
Gregorio, A.D., Iacus, S.M.: On penalized estimation for dynamical systems with small noise. Electron. J. Stat. 12, 1614–1630 (2018)
12. 12.
Hajipour, M., Jajarmi, A., Baleanu, D.: On the accurate discretization of a highly nonlinear boundary value problem. Numer. Algorithms 79(3), 679–695 (2018)
13. 13.
Kato, K.: Asymptotics for argmin processes: convexity arguments. J. Multivar. Anal. 100, 1816–1829 (2009)
14. 14.
Knight, K., Fu, W.J.: Asymptotics for lasso-type estimators. Ann. Stat. 28, 1356–1378 (2000)
15. 15.
Kouritzin, M.A., Heunis, A.J.: A law of the iterated logarithm for stochastic processes defined by differential equations with a small parameter. Ann. Probab. 22(2), 659–679 (1994)
16. 16.
Kutoyants, Y.: Identification of Dynamical Systems with Small Noise. Kluwer Academic, Dordrecht (1994)
17. 17.
Kutoyants, Y., Pilibossian, P.: On minimum $$L_{1}$$-norm estimate of the parameter of the Ornstein–Uhlenbeck process. Stat. Probab. Lett. 20(2), 117–123 (1994)
18. 18.
Kutoyants, Y., Pilibossian, P.: On minimum uniform metric estimate of parameters of diffusion-type processes. Stoch. Process. Appl. 51(2), 259–267 (1994)
19. 19.
Liptser, R.S., Shiryayev, A.N.: Statistics of Random Processes, vol. I. Springer, New York (1977)
20. 20.
Liptser, R.S., Shiryayev, A.N.: Statistics of Random Processes, vol. II. Springer, New York (1978)
21. 21.
Nkurunziza, S.: Shrinkage strategies in some multiple multi-factor dynamical systems. ESAIM, Probab. Stat. 16, 139–150 (2012)
22. 22.
Zhao, H., Zhang, C.: Minimum distance parameter estimation for SDEs with small α-stable noises. Stat. Probab. Lett. 145, 301–311 (2019)