An Approach to Constructing Explicit Estimators in Nonlinear Regression

Linke, Yu. Yu.; Borisov, I. S.

doi:10.1134/S1055134423040065

An Approach to Constructing Explicit Estimators in Nonlinear Regression

Open access
Published: 14 December 2023

Volume 33, pages 338–346, (2023)
Cite this article

Download PDF

You have full access to this open access article

Siberian Advances in Mathematics Aims and scope Submit manuscript

An Approach to Constructing Explicit Estimators in Nonlinear Regression

Download PDF

Yu. Yu. Linke¹ &
I. S. Borisov¹

386 Accesses
2 Citations
Explore all metrics

This article has been updated

Abstract

We consider the problem of constructing explicit consistent estimators of finite-dimensional parameters of nonlinear regression models using various nonparametric kernel estimators.

On Existence of Explicit Asymptotically Normal Estimators in Nonlinear Regression Problems

Robust estimation in partially nonlinear models

Article 20 June 2023

Robust Methods for High-Dimensional Regression and Covariance Matrix Estimation

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1. INTRODUCTION AND STATEMENT OF THE PROBLEM

We consider the following regression model:

$$ X_i=f_{\bf \theta }({\bf z}_i)+\xi _i, \qquad i=1,\ldots ,n,$$

(1)

where $\{f_{\theta }({\bf t});\thinspace {{\mathbf \theta }}=(\theta _1,\ldots ,\theta _m)\in {\bf \Theta }\}$ is a parametric family of real-valued continuous functions of $k $ variables, the $k $-dimensional vectors ${\bf t}=(t_1,\ldots ,t_k) $ belong to a Jordan-measurable compact set ${\cal P}\subset {\mathbb R}^k$ without isolated points, and $\bf \Theta $ is an open set in ${\mathbb R}^m $. The collection of regressors $\{{\bf z}_i;\thinspace i=1,\ldots ,n\}$ consists of observable random $k $-dimensional vectors with unknown (in general) distributions and with values in $\cal P$, not necessarily independent or identically distributed. This collection of regressors may be considered as an array, i.e., the vectors $ \{{\bf z}_i; \; i=1,\ldots ,n\}$ may depend on $n $. In particular, this scheme includes models with fixed regressors (for example, an equidistant design). The errors $\{\xi _i;\thinspace i=1,\ldots ,n\}$ are centered random variables (additional conditions will be given at the end of this section).

The task is to construct explicit estimators for the unknown true values of the $m $-dimensional parameter ${{\mathbf \theta }}\in {\bf \Theta }$ by a sample of $(k+1) $-dimensional observations $\{(X_i, {\bf z}_i);\thinspace i=1,\ldots ,n\}$. Note that the problem of constructing explicit estimators in nonlinear regression models with certain restrictions on their parameters has already attracted attention of researchers (see [1, 3, 7, 8, 16, 19,20,21, 23, 25, 27, 28, 34, 36, 37]. First of all, we are talking about models in which estimation can be reduced to linear regression problems. Traditionally (see, for example, [7, 8, 25, 27]), intrinsically linear (or nonintrinsically nonlinear) models include regression models for which the regression equation can be reduced by one or another transformation of the responses $\{X_i\}$ and the initial parameter $ {\mathbf \theta }$ to linear one. As a rule, we are talking about models with multiplicative noise (see, for example, [27]). In [20] the definition of intrinsically linear models was clarified and expanded and it was established that a number of well-known nonlinear regression models with additive errors satisfy this definition. For such models, explicit parameter estimators are constructed using linear regression analysis methods.

It is worth noting that the ability to transform the original regression model to a linear one (especially in the case of additive errors) is a rare exception rather than the rule. In [16] and [19], an approach was proposed that, under broad conditions, allows one to construct explicit consistent estimators of the parameters of nonlinear regression models that are not intrinsically linear. In the present paper, we further expand this class of nonlinear regression models.

It is important to note that, in addition to being of independent interest, constructing explicit consistent estimators for nonlinear regression models are of exceptional interest for one-step estimation. It is well known that, in nonlinear regression problems, asymptotically optimal estimators obtained, for example, by quasi-likelihood, least squares, and maximum likelihood methods, are often given implicitly in the form of solutions to certain equations (see, for example, monographs [3, 7, 9, 11, 25, 30, 34, 36, 37]). Together with that, the situation when there are several roots of one or another equation that determines an estimator, is very typical (see, for example, [31,32,33]). This circumstance is the main problem that complicates the use of numerical methods. The fact is that if the initial approximation of a parameter is unsuccessfully chosen, the iterative procedures detect only the root closest to the starting point, and not to the parameter. One way to work around this problem is to use so-called one-step estimators. The idea of one-step estimation, going back to R. Fisher’s works, is as follows: The starting point of an iterative Newton-type procedure is not an arbitrary point but a preliminary consistent estimator that converges to a parameter with a certain speed. It turns out that, in this case, only one step of the iterative procedure is often sufficient to obtain an explicit estimator (the so-called “one-step”, which has the same asymptotic accuracy as the required statistics (see, for example, [35]). In fact, if we have some preliminary consistent estimator then we get the opportunity, using Newton-type procedures, to isolate the one of the roots of the equation of interest to us, which approximates the parameter.

In recent years, interest in one-step estimation has only been growing in the statistical literature, and the bibliography in this area is very extensive (a number of bibliographic references can be found, for example, in [14]). The importance of developing a one-step estimation methodology specifically for nonlinear regression problems is emphasized, for example, in the monograph [31]. In various formulations of nonlinear regression problems, one-step estimates are studied, for example, in [4, 23, 24, 26, 29, 31], but the existence of preliminary estimators in these papers (with the exception of the article [23] devoted to fractional-linear models) is only postulated. In the papers [13,14,15] related to one-step estimation in nonlinear regression, when constructing preliminary estimators, the technique from [16] and [19] is used. The new estimators proposed in the present paper, can also be used as preliminary estimators in one-step estimation procedures.

Let us return to the conditions for Model (1). We additionally assume that the random errors $\{\xi _i;\thinspace i=1,\ldots ,n\}$ form a sequence of martingale differences under the condition

$$ M_p=\sup _{i\leq n}{\mathbb E}|\xi _i|^p<\infty \quad \mbox {for some} \thinspace \thinspace p>k \thinspace \thinspace \mbox {and}\thinspace \thinspace p\ge 2,$$

where $M_p $ does not depend on $n $. It is also assumed that the random variables $\{\xi _i\} $ do not depend on $\{{\bf z}_i\} $ but may depend on $n $. Next, we assume that with probability $1 $ each observed point ${\bf z}_i $ from the set of regressors has a multiplicity of $1 $ in the sample. Note that if in the sample $\{{\bf z}_i; \thinspace i=1,\ldots ,n\}$ there were multiple points then we could consider the arithmetic mean of the responses $X_i $ with identical regressors and, thereby, reduce the problem to the original one. It is important to note that in the presence of an increasing (with increasing $n $) multiplicity of one or another point from the set of regressors, the problem is reduced to a classical statistical formulation of the method of moments. Therefore, we exclude such cases from consideration. So, even in the presence of nonrandom regressors, we reject the possibility of “full control” over them when one can consider a growing number of responses with the same point from the set of regressors.

For each $n$, we denote by $\varepsilon _n $ the minimum possible size of the $\varepsilon $-net formed by a set of regressors $\{{\bf z}_1,\ldots ,{\bf z}_n\} $ in the compact set $\cal P $. So, the only restriction on the regressors will be the following condition:

(D)
$\varepsilon _n \stackrel {p}{\to } 0$ as $n\to \infty $.

Remark 1$. $ If all ${\bf z}_i $ do not depend on $n $ then convergence in probability in condition (D) will be equivalent to convergence almost surely due to the monotonicity of the sequence $\{\varepsilon _n\}$. For example, if $\{{\bf z}_i;\thinspace i=1,2,\ldots \}$ is a sequence of identically distributed random vectors (not necessarily stationary) under the strong mixing condition and the compact $\cal P $ is the support of the marginal distribution then condition (D) will be satisfied. In [12] and [17], some examples of stronger correlation of regressors, when all known conditions of weak dependence are not met, but condition (D) holds.

2. METHODOLOGY FOR OBTAINING EXPLICIT ESTIMATORS

For simplicity of presentation, we further set ${\cal P}=[0,1]^k $. Denote by $C[0,1]^{k} $ the space of continuous functions on $[0,1]^{k} $. Suppose that the space $C[0,1]^{k} $ is endowed with some norm $\|\cdot \| $. So, in what follows, we will consider the linear normed space $ (C[0,1]^{k},\|\cdot \|)$, which is assumed to be separable with respect to the metric generated by the norm $\|\cdot \| $. We will be interested mainly in the following two cases:

$$ \|f\|\equiv \|f\|_{sup} =\sup _{{\bf t}\in [0,1]^{k}}|f({\bf t})|, \qquad \quad \|f\|\equiv \|f\|_{pw}=\sum \limits _{i=1}^{\infty }\frac {|f({\bf t}_i)|}{2^i}, $$

where the summation is taken over all points ${\bf t}_i\in [0,1]^{k}$ with rational coordinates numbered in an arbitrary manner. Note that convergence in the $\|\cdot \|_{pw} $-norm is equivalent to pointwise convergence of functions in $ C[0,1]^{k}$.

The proposed approach is based on the use of nonparametric kernel estimators of the regression function. We need the following assumptions:

(I)
There exists a continuous mapping ${\bf G}: (C[0,1]^{k}, \|\cdot \|)\to {\mathbb R}^m $ for which the vector-valued function ${\bf g}({{\mathbf \theta }})={\bf G}(f_{\bf \theta }) $ is a homeomorphism of the open set $\bf \Theta $ to some open domain of ${\mathbb R}^m $;
(II)
There exists a $\|\cdot \|$-consistent $( $or a strongly $\|\cdot \| $-consistent $) $ nonparametric estimator $f_n^*({\bf t})\in (C[0,1]^k, \|\cdot \|)$ for the regression function $f_{{\theta }_0}({\bf t})$, where ${{\mathbf \theta }}_0 $ is the true value of the parameter in (1).

We now define an estimator by the formula

$$ {{\mathbf \theta }}_n^*={\bf g}^{-1}({\bf G}(f_n^*)),$$

where ${\bf g}^{-1} $ is the inverse transform for $\bf g $; in the case where ${\bf G}(f_n^*)\notin \{{\bf g}({{\mathbf \theta }});\thinspace {{\mathbf \theta }}\in \bf \Theta \} $, we set ${\bf g}^{-1}({\bf G}(f_n^*))={\bf 0}$ by definition.

The following assertion is valid.

Theorem 1$. $ If assumptions (I) and (II) are met then the estimator ${\bf \theta }_n^* $ will be consistent (strongly consistent in accordance with assumption (II)) for ${\bf \theta }_0 $.

Proof of this statement is quite clear. Indeed, say, the $ \|\cdot \|$-consistency of a nonparametric estimator $ f_n^*({\bf t})$ means that

$$ \|f_n^*-f_{{\bf \theta }_0}\|\stackrel {p}{\to } 0\thinspace \thinspace \thinspace \mbox {as}\thinspace \thinspace \thinspace n\to \infty .$$

In other words, with probability tending to 1, due to continuity of the mapping $\bf G $ in the norm $\|\cdot \| $ of the space $C[0,1]^k $, one has ${\bf G}(f_n^*)\in S_{g({\bf \theta _0})}(\varepsilon )$ for any given $\varepsilon >0 $, where $S_{g({\bf \theta _0})}(\varepsilon )$ is the open ball of radius $\varepsilon $ with center at the point ${\bf g}({{\mathbf \theta }_0}) $ in ${\mathbb R}^m $. Since the image of an open set under a homeomorphic transformation is open, then there is an embedding $S_{g({\bf \theta _0})}(\varepsilon )\subseteq \{{\bf g}({{\mathbf \theta }});\thinspace {{\mathbf \theta }}\in {\bf \Theta }\}$ for a sufficiently small $ \varepsilon >0$. It remains to use continuity of the inverse transformation ${\bf g}^{-1}$, from which it follows that ${{\mathbf \theta }}_n^*\stackrel {p}{\to } {{\mathbf \theta }}_0$ for $n\to \infty $. Similar reasoning is used to prove strong $\|\cdot \| $-consistency. $\quad \square $

Remark 2$. $ The technique proposed in Theorem 1 for constructing estimators of finite-dimensional parameters in nonlinear regression problems is close to the methodology of the method of moments. In fact, Theorem 1 suggests to equate the values of the regression function at certain points from its domain of definition to the corresponding values of its consistent nonparametric estimator. In this case, the number of such equations (or what is the same — the above indicated points) must coincide with the dimension of the regression function parameter. This is exactly what happens in the method of moments, when for constructing parameter estimators, the true moments (as a function of the parameter under consideration) are equated to the corresponding sample moments, which, in turn, will be consistent estimators for the true moments. So, by analogy with the method of moments, the approach proposed in Theorem 1 can be stated as follows. First we define the system of equations

$$ f_{\theta }({\bf t}_j) = f_n^*({\bf t}_j),\quad j=1,\ldots ,\dim {\mathbf \theta }=m.$$

(2)

The points $\{{\bf t}_j\} $ are chosen so that this system would be uniquely solvable and the inverse mapping would be continuous. Here we do not plan to discuss the issue of optimal choice of points $\{{\bf t}_j\}$.

Regarding assumption $\rm (II)$, we consider examples of nonparametric kernel estimators that will be $\|\cdot \|_{pw} $-consistent if only condition (D) is satisfied. First of all, we define a smoothing kernel function $K({\bf t})$, ${\bf t}\in \mathbb R^k$, as the density of a centrally symmetric distribution with support in $[-1,1]^k$. We will assume that the function $K({\bf t})$ satisfies the Lipschitz condition everywhere on $\mathbb R^k$. We also need the notation $K_{h}({\bf t})=h^{-k} K(h^{-1}{\bf t}) $, $h\in (0,1) $, that is a distribution density with support in $[-h,h]^k $.

Finally, we define an estimator for the function $f_{\theta _0} $ in assumption $\rm (II) $, by the formula

$$ f^*_{n,h}({\bf t})=\frac {\displaystyle \sum \nolimits _{i=1}^nX_{i}K_{h}({\bf t}-{\bf z}_{i})\Lambda _k({\cal P}_i)}{\displaystyle \sum \nolimits _{i=1}^nK_{h}({\bf t}-{\bf z}_{i})\Lambda _k({\cal P}_i)}, $$

(3)

where $h$ is the so-called observation window width, which tends to zero at a certain rate as $n $ grows, $\Lambda _k(\cdot ) $ is the Lebesgue measure in $\mathbb R^k $, the measurable subsets $\{{\cal P}_i, i=1,\ldots ,n\} $ form a finite partition of the cube $[0,1]^k $, and each of these subsets contains exactly one regression point from the set $\{{\bf z}_i\}$ and the maximum diameter of all partition elements ${\cal P}_i $ should tend to zero as the sample size $n $ increases (if condition (D) is satisfied, such a partition obviously exists, see [17]). From a practical point of view, the specified partition with marked points can be organized, for example, by the method of successive coordinate-median sections or using a Voronoi mosaic (see details in [17]). A univariate version of this estimator in [2] is defined as follows:

$$ f^*_{n,h}(t)=\frac {\displaystyle \sum \nolimits _{i=1}^nX_{ni}K_{h}(t-z_{n:i})\Delta z_{ni}}{\displaystyle \sum \nolimits _{i=1}^nK_{h}(t-z_{n:i})\Delta z_{ni}},$$

(4)

where $z_{n:0}=0 $, $z_{n:1}\leq \ldots \leq z_{n:n}$ are the order statistics obtained from the sample $ \{z_i;\thinspace i =1,\ldots , n\}$, $\Delta z_{ni}=z_{n:i}-z_{n:i-1}$, $i=1,\ldots ,n $; the random variables $X_{ni} $ are the responses from the regression equation (1) associated with the order statistics $z_{n:i} $, respectively.

Let us also introduce into consideration a classical Nadaraya–Watson estimator by the formula

$$ \widehat f_{n,h}({\bf t})=\frac {\displaystyle \sum \nolimits _{i=1}^nX_{i}K_{h}({\bf t}-{\bf z}_{i})}{\displaystyle \sum \nolimits _{i=1}^nK_{h}({\bf t}-{\bf z}_{i})}.$$

(5)

It is known (see [2, 12, 17, 18]) that only under condition (D) there are sequences $h\equiv h_n\to 0$ for which the above three estimators are $\|\cdot \|_{pw}$-consistent, and the estimators (3) and (4) are also $\|\cdot \|_{sup}$ -consistent. Further, by default, it is assumed that when using kernel estimators (3) and (5), the window width $ h\equiv h_n$ tends to zero at a known rate as $n\to \infty $. In Section 3 of this paper, formula (11) determines the optimal window width for the kernel estimator (3) (see also Remark 5 below).

Remark 3$. $ Note that the following relations hold:

$$ \begin {gathered} {f^*_{n,h}}({\bf t})={\rm arg}\min \limits _{a} \sum \limits ^n_{i=1}(X_{i}-a)^2K_{h}({\bf t}-{\bf z}_{i})\Lambda ({\cal P}_i),\\ {\widehat f_{n,h}}({\bf t})={\rm arg}\min \limits _{a} \sum \limits ^n_{i=1}(X_{i}-a)^2K_{h}({\bf t}-{\bf z}_{i}); \end {gathered} $$

i.e., the kernel estimator $f^*_{n,h}(t) $ (as well as the classical Nadaraya-Watson estimator $ \widehat f_{n,h}(t)$) is an estimator of the weighted least squares method and belongs to the class of local constant estimators, but with some different weights than those in the construction of Nadaraya-Watson estimators.

Let us give examples of estimators constructed by the use of the above theorem.

Example 1$. $ We consider Model (1) with the regression function

$$f_{\bf \theta }({\bf t})={\theta }_1t_1^{{\theta }_2}t_2^{{\theta }_3}, $$

where ${\bf t}=(t_1,t_2)\in {\mathbb R}^2_+ $ and ${{\mathbf \theta }}=(\theta _1,\theta _2,\theta _3)\in {\mathbb R}^3_+$ are vectors with positive coordinates. This is the so-called Cobb-Douglas model, quite popular in econometrics (see, for example, [10]). Consider a continuous mapping from $(C[0,1]^{2}, \|\cdot \|_{pw}) $ to ${\mathbb R}^3_+ $ defined by the formula

$$ G(f)=(f(2^{-1},2^{-1}),f(2^{-1},3^{-1}),f(3^{-1},3^{-1})). $$

We now show that the superposition ${\bf g}({\bf \theta })\equiv G(f_{\bf \theta })$ is a homeomorphism from ${\mathbb R}^3_+$ to the conic domain $C^3_+=\{(r_1,r_2,r_3)\in {\mathbb R}^3_+:\thinspace 0<r_3<r_2<r_1\}$. It is enough to prove that ${\bf g}({{\mathbf \theta }}) $ is a bijection since continuity of this mapping is obvious. Indeed, consider the system of equations

$$ \begin {cases} f_{\bf \theta }(2^{-1},2^{-1})\equiv {\theta }_12^{-{\theta }_2}2^{-{\theta }_3}=r_1, \\ f_{\bf \theta }(2^{-1},3^{-1})\equiv {\theta }_12^{-{\theta }_2}3^{-{\theta }_3}=r_2, \\ f_{\bf \theta }(3^{-1},3^{-1})\equiv {\theta }_13^{-{\theta }_2}3^{-{\theta }_3}=r_3, \end {cases}$$

where $(r_1,r_2,r_3) $ is an arbitrary point from $C^3_+ $. Taking logarithms of this system of equations, we reduce it to the equivalent form

$$ \begin {cases} \widetilde \theta _1-\theta _2\log 2-\theta _3\log 2=s_1, \\ \widetilde \theta _1-\theta _2\log 2-\theta _3\log 3=s_2, \\ \widetilde \theta _1-\theta _2\log 3-\theta _3\log 3=s_3, \end {cases} $$

where $\widetilde \theta _1=\log {\theta _1} $, $s_j=\log r_j $, $j=1,2,3 $. It is easy to check that the matrix $\mathbb A $ of this system of linear equations is nonsingular. Its only solution is easily found by sequentially eliminating variables:

$$ \widetilde \theta ^*_{n1}=\frac {s_1\log 3-s_3\log 2}{\log (3/2)},\quad \theta ^*_{n2}=\frac {s_1-s_2}{\log (3/2)},\quad \theta ^*_{n3}=\frac {s_2-s_3}{\log (3/2)}.$$

(6)

Now for our purposes we can use, for example, the kernel estimator $f^*_{n,h}({\bf t})$ defined in (3). Put in (6)

$$ s_1=\log f^*_{n,h}(2^{-1},2^{-1}),\quad s_2=\log f^*_{n,h}(2^{-1},3^{-1}),\quad s_3=\log f^*_{n,h}(3^{-1},3^{-1}).$$

In view of the above, on the set of elementary outcomes of asymptotically full measure, for all sufficiently large $n $, the double inequality $s_1>s_2>s_3 $ holds, i.e., the three-dimensional estimator $(\widetilde \theta _{n1}^*,\theta _{n2}^*,\theta _{n3}^*) $ is well defined, and by Theorem 1, it will be consistent for the three-dimensional parameter $(\log {\theta _1},{\theta _2},{\theta _3}) $. So the estimator ${\mathbf \theta }_n^*=(\exp \{\widetilde \theta _{n1}^*\},\theta _{n2}^*,\theta _{n3}^*) $ will be consistent for the initial parameter ${\mathbf \theta }=({\theta _1},{\theta _2},{\theta _3}) $.

Example 2$. $ We consider Model (1) with the regression function

$$f_{\bf \theta }(t)=\frac {{\theta }_1t}{t+{\theta }_2},$$

where $t>0 $ and ${{\mathbf \theta }}=(\theta _1,\theta _2)\in {\mathbb R}^2_+$. This is the so-called Michaelis-Menten model, well known in biochemistry (see, for example, [5, 6]). Consider a continuous mapping from $(C[0,1], \|\cdot \|_{pw}) $ to ${\mathbb R}^2_+ $

$$ {\bf G}(f)=(f(1),f(1/2)). $$

Now we show that the superposition ${\bf g}({{\mathbf \theta }})\equiv {\bf G}(f_{\bf \theta })$ is a homeomorphism of the open positive quadrant ${\mathbb R}^2_+$ to the open cone $ C^2_+=\{(r_1,r_2);\thinspace r_2>0,\thinspace r_2<r_1<2r_2\} $. It is enough to prove that ${\bf g}({{\mathbf \theta }}) $ is a bijection. Indeed, consider the system of equation

$$ \begin {cases} \frac {\theta _1}{1+\theta _2}= r_1, \\ \frac {2^{-1}\theta _1}{2^{-1}+\theta _2}=r_2 \end {cases} $$

for any vector $(r_1,r_2)\in C^2_+ $. This system obviously reduces to the following system of two linear equations in two unknowns $\theta _1 $ and $\theta _2 $:

$$ \begin {cases} \theta _1-r_1\theta _2=r_1, \\ \theta _1-2r_2\theta _2=r_2, \end {cases}$$

whose matrix is nonsingular everywhere in the above open cone. As a result, with a known vector $(r_1,r_2) $, we obtain

$$ \theta ^*_{n1}=r_1\left (1+\frac {r_1-r_2}{2r_2-r_1}\right ),\quad \theta ^*_{n2}=\frac {r_1-r_2}{2r_2-r_1}. $$

(7)

Obviously, the constructed one-to-one mapping is bilaterally continuous, i.e., is a homeomorphism of the above open domains.

Now we can consider the kernel estimator $f^*_{n,h}({t}) $ defined in (4), or the Nadaraya–Watson estimator $\widehat f_{n,h}({ t }) $ defined in (5). As has already been proven in [2] and [18], if condition (D) is satisfied then both of these estimators will be consistent in the norm $\|\cdot \|_{pw}$ for some $h\equiv h_n\to 0$. So, for example, you can put in (7)

$$ r_1=f^*_{n,h}({t})({1}),\quad r_2=f^*_{n,h}({t})({1/2}). $$

In this case, the two-dimensional estimator ${\mathbf \theta }_{n}^*=(\theta ^*_{n1},\theta ^*_{n2}) $ in (7) is well defined on a set of elementary outcomes of asymptotically full measure as $n\to \infty $, and moreover, is consistent by virtue of Theorem 1.

Remark 4$. $ Note that explicit estimators for the parameters of the Michaelis–Menten model were known earlier (see [22]). In particular, the estimators in [22] are constructed, in essence, due to the intrinsical linearity of this model, and fulfillment of conditions like (D) in [22] is not required. However, the fundamental difference between this work and the results of the present paper is that the regressors $\{{\bf z}_i\} $ in [22] are nonrandom and the random errors $\{\xi _i\} $ are independent.

Example 3$. $ We consider the so-called logistic regression. In this case,

$$f_{\theta }({\bf t})=\left (1+ e^{-({\bf t},{\mathbf \theta })}\right )^{-1}, $$

where $\dim {\bf t}=\dim {{\mathbf \theta }}=m $ and $(\cdot ,\cdot ) $ is the standard Euclidean inner product in ${\mathbb R}^m $.

Next, for an arbitrary set of points $\{{\bf t}_j; \thinspace j=1,\ldots ,m\}$ from the unit $m $-dimensional cube, the system of equations (2) is easily reduced to the following system of linear equations:

$$ \begin {cases} ({\bf t}_1,{\mathbf \theta })=r_1, \\ \ldots \ldots \ldots \ldots \\ ({\bf t}_m,{\mathbf \theta })=r_m. \end {cases} $$

The only restriction on the vectors $\{{\bf t}_j\} $ is their linear independence, i.e., nondegeneracy of the matrix $ \mathbb T$ of the reduced system. Now, let us put

$$r_j=\log \frac {f_n^*({\bf t}_j)}{1-f_n^*({\bf t}_j)},$$

where $f_n^*({\bf t}) $ is any of the estimators (3) or (5). Note that due to the $ \|\cdot \|_{pw}$-consistency of the indicated estimators under condition (D) only, the inequality $f_n^*({\bf t}_j)<1 $ holds with probability tending to 1 as $n\to \infty $ for any fixed ${\bf t}_j $. If this inequality is violated, we assume $r_j=0 $. We finally obtain the following construction of a consistent estimator for the logistic model:

$$ {{\mathbf \theta }}^*_n={\mathbb T}^{-1} (r_1,\ldots ,r_m)^{\top }.$$

(8)

3. ANALYSIS OF $\alpha _n$ -CONSISTENCY OF THE ESTIMATORS

Now we will discuss the question of clarifying the results of Theorem 1. We recall that the estimator $ {{\mathbf \theta }}^*_n$ of the parameter ${\mathbf \theta }\in {\mathbb R}^m$ is called $\alpha _n $-consistent if $\alpha _n({{\mathbf \theta }}^*_n-{{\mathbf \theta }})\stackrel {p}{\to } {\bf 0} $ and $\alpha _n\to \infty $ as $n\to \infty $. This definition can be easily transferred to the case of infinite-dimensional parameters.

Definition$. $ Let $({\cal C}, d) $ be a separable metric space. A sequence of random elements $ g_n^*\in {\cal C}$ is called an $\alpha _n $-consistent estimator of a random element $g\in {\cal C} $ if $\alpha _nd(g_n^*,g)\stackrel {p}{\to } 0$ and $\alpha _n\to \infty $ as $n\to \infty $.

We now present a refinement of Theorem 1 under appropriate detailing of the restrictions.

Theorem 2$. $ Under the conditions of Theorem 1, let the mappings $\bf G$ and $ {\bf g}^{-1}$ satisfy Lipschitz condition and there is a nonparametric $\alpha _n $-consistent $( $in the norm of the space ($C[0,1]^{k}, \|\cdot \|) $) estimator $f^*_n $ for the unknown regression function $f_{\theta _0} $. Then the estimator $ {{\mathbf \theta }}_n^*$ will be $\alpha _n$-consistent for the true parameter ${{\mathbf \theta }}_0 $.

Proof immediately follows from the estimator’s construction ${{\mathbf \theta }}_n^*={\bf g}^{-1}({\bf G}(f_n^*)) $ in Theorem 1 and the above definition of $\alpha _n $-consistent nonparametric estimator $f^*_n $. $\quad \square $

As is easy to see, the main condition in this theorem is the existence of an $\alpha _n $-consistent nonparametric estimator $f^*_n $. The following statement gives an example of such a nonparametric estimator.

Theorem 3$. $ For each $ {{\mathbf \theta }}\in {\bf \Theta } $ in Model (1), let the regression function $f_{\theta }({\bf t}) $ satisfy the Lipschitz condition and do not coincide everywhere with a constant function. Then, under condition (D), the kernel estimator (3) will be $ \alpha _n$-consistent for $ \alpha _n=o(h_n^{-1})$, where

$$ h_n=\left ({\mathbb E}(\varepsilon _n^{kp/2})\right )^{\frac {1}{p(k/2+1)+k}}.$$

Proof. In [17], it is shown that, under the conditions of the model under consideration, for the kernel estimator (3) the following inequality is valid with probability 1:

$$ \|f^*_{n,h}-f_{\bf \theta _0}\|_{sup}\le \omega _{f_{\theta _0}}(h)+\zeta _n(h),$$

(9)

where $\omega _{f_{\theta _0}}(h)$ is the modulus of continuity (strongly positive for each $h>0$) of the true regression function, and the random variable $\zeta _n(h)$ has the following order in probability:

$$ \zeta _n(h)=O_p\left (\left (h^{-k(p/2+1)}\thinspace {\mathbb E}(\varepsilon _n^{kp/2})\right )^{1/p}\right ); $$

here the upper limit in the definition of the symbol $O_p(\cdot ) $ depends on $m $, $p $, $M_p $, and the kernel $K $. The optimal window width $h\equiv h_n $, which equalizes the orders of both terms on the right-hand side of (9), is found as a solution $h\equiv h_n $ to the equation

$$ {\mathbb E}(\varepsilon _n^{kp/2})=h^{k(p/2+1)}\omega ^p_{f_{\theta _0}}(h).$$

(10)

For the Lipschitz regression function, in this case we obtain a representation for the optimal (in order of smallness) window width:

$$ h_n=\left ({\mathbb E}(\varepsilon _n^{kp/2})\right )^{\frac {1}{p(k/2+1)+k}},$$

(11)

which is what needed to be shown. $\quad \square $

Thus, if we put $\alpha _n=o(h_n^{-1}) $ then in all the above examples, when substituting the nonparametric kernel estimator $f^*_{n,h_n}$, using Theorems 2 and 3 we obtain $\alpha _n$-consistent estimators for multidimensional parameters of the nonlinear regression models under consideration. For example, for the estimator (8), this statement follows from the fact that the function $\phi (z)=\log \frac {z}{1-z} $ satisfies the Lipschitz condition in a neighborhood of each point of the open interval $(0,1)$, and, in addition, by virtue of (8), on a set of elementary outcomes of asymptotically full measure the inequality

$$ \|{\mathbf \theta }_n^*-{\mathbf \theta }_0\|\le C\max _{j\le m}|f_n^*({\bf t}_j)-f_{\theta _0}({\bf t}_j)| $$

(12)

holds for all sufficiently large $n $, where the nonrandom constant $ C $ depends on the collection $\{{\bf t}_j\} $ and $m $.

Remark 5$. $ If under condition (D) the minimal radius of an $\varepsilon $-net admits a deterministic upper bound $\varepsilon _n\le \hat \varepsilon _n$, then formula (11) for window width can be transformed as

$$ h_n\leq \widehat h_n\equiv \hat \varepsilon _n^{\frac {k}{k+2+2/p}}. $$

Moreover, for a sufficiently large $p $ (for example, when the noise is Gaussian), the order of smallness of the window width can be made arbitrarily close to the value $\hat \varepsilon _n^{{k}/(k+2)}$. As an example, we can consider a one-dimensional ($k=1$) equidistant design when in the estimator (4) we should put $z_{n:i}=i/n $ and $\hat \varepsilon _n=\Delta z_{ni}=1/n$. In this case, $\hat h_n=\hat \varepsilon _n^{{1}/{3}}=n^{-{1}/{3}}$. It is also worth noting that in the case of independent identically distributed random regressors $\{{\bf z}_i; \; i=1,\ldots ,n\}$ in Model (1) it is not difficult to establish asymptotic normality of the estimators considered above using known results on asymptotic normality of the kernel estimators (3)–(5). For example, for $k=1 $, the normalization of the residual $|f_n^*({\bf t}_j)-f_{\theta _0}({\bf t}_j)|$ in broad conditions will have order $\sqrt {n h_n}$ for some sequence $h_n $ (window width) tending to zero with a power-law rate in $n $, which is an upper bound (unattainable) for the normalizing factor $\alpha _n$ in the definition of $ \alpha _n$-consistency.

Remark 6$. $ The approach proposed here for constructing $\alpha _n $-consistent estimators of nonlinear regression parameters is somewhat inferior in accuracy to the estimators in [16] and [19], where another method was proposed for constructing $\alpha _n $-consistent and asymptotically normal estimators without using nonparametric kernel estimators. Under broad conditions, the value of $\alpha _n $ in the mentioned papers can be arbitrarily close to $\sqrt n $, while using the estimator (12) it is possible to obtain $\alpha _n $-consistent estimators with the order of growth of $\alpha _n $ no more than $n^{1/3} $ (see Remark 5). However, explicit estimators obtained by the approach of Theorem 1, in many models look much simpler than the estimators proposed in [16] and [19] in the same models. In addition, it is important to note that with the help of one-step procedures the accuracy of certain estimators can be significantly improved, while being preliminary $n^{1/4}$-consistent estimators can be used under a wide range of conditions (see, for example, [13,14,15]). So the estimators proposed in the present paper can be used as preliminary (initial) for multistep Newton-type procedures.

Change history

31 May 2024
The reference to Russian pages has been changed in the .pdf file. The status of the paper has been changed to OpenAccess.

REFERENCES

D. M. Bates and D. G. Watts, Nonlinear Regression Analysis and Its Applications (Wiley, Hoboken, NJ, 1988).
Book Google Scholar
I. S. Borisov, Yu. Yu. Linke, and P. S. Ruzankin, “Universal weighted kernel-type estimators for some class of regression models,” Metrika 84, 141 (2021).
Article MathSciNet Google Scholar
S. Chatterjee and A. S. Hadi, Regression Analysis by Example (Wiley, Hoboken, NJ, 2006).
Book Google Scholar
I. H. Chung and K. H. Kim, “Asymptotic properties of the one-step M-estimators in nonlinear regression model,” Comm. Korean Math. Soc. 7, 293 (1992).
Google Scholar
H. Dette, V. Melas, and A. Pepelyshev, “Standardized E-optimal designs for the Michaelis–Menten model,” Stat. Sin. 13, 1147 (2003).
MathSciNet Google Scholar
H. Dette, V. Melas, and W. K. Wong, “Optimal design for goodness-of-fit of the Michaelis–Menten enzyme kinetic function,” J. Amer. Statist. Assoc. 100, 1370 (2005).
Article MathSciNet Google Scholar
N. R. Drape and H. Smith, Applied Regression Analysis (Wiley, Hoboken, NJ, 1998).
Book Google Scholar
R. J. Freund, W. J. Wilson, and P. Sa, Regression Analysis$: $ Statistical Modeling of a Response Variable (Academic Press, Cambridge, MA, 2006).
Google Scholar
G. Ghilagaber, H. Midi, and H. Riazoshams, Robust Nonlinear Regression: With Applications Using R (Wiley, Hoboken, NJ, 2019).
Google Scholar
Yu. D. Grigoriev, V. B. Melas, and P. V. Shpilev, “Excess of locally D-optimal designs for Cobb–Douglas model,” Stat. Pap. 259, 1425 (2018).
Article MathSciNet Google Scholar
C. C. Heyde, Quasi-Likelihood and its Application: A General Approach to Optimal Parameter Estimation (Springer, New York, 1997).
Book Google Scholar
Yu. Yu. Linke, “Towards insensitivity of Nadaraya–Watson estimators to design correlation,” Theory Probab. Appl. 68, 198 (2023).
Article MathSciNet Google Scholar
Yu. Yu. Linke, “Asymptotic properties of one-step weighted M-estimators with application to some regression problems,” Theory Probab. Appl. 62, 373 (2018).
Article MathSciNet Google Scholar
Yu. Yu. Linke, “Asymptotic properties of one-step M-estimators,” Comm. in Statist.—Theory and Methods 48, 4096 (2019).
Article MathSciNet Google Scholar
Yu. Yu. Linke, “Asymptotic normality of one-step M-estimators based on nonidentically distributed observations,” Statist. Probab. Lett. 129, 216 (2017).
Article MathSciNet Google Scholar
Yu. Yu. Linke and I. S. Borisov, “Constructing explicit estimators in nonlinear regression models,” Theory Probab. Appl. 63, 22 (2018).
Article MathSciNet Google Scholar
Yu. Yu. Linke, I. S. Borisov, and P. S. Ruzankin, “Universal kernel-type estimation of random fields,” Statistics 57, 785 (2023).
Article MathSciNet Google Scholar
Yu. Yu. Linke and I. S. Borisov, “Insensitivity of Nadaraya–Watson estimators to design correlation,” Comm. in Statist.—Theory and Methods 51, 6909 (2022).
Article MathSciNet Google Scholar
Yu. Yu. Linke and I. S. Borisov, “Constructing initial estimators in one-step estimation procedures of nonlinear regression,” Statist. Probab. Lett. 120, 87 (2017).
Article MathSciNet Google Scholar
Yu. Yu. Linke and I. S. Borisov, “Toward the notion of intrinsically linear models in nonlinear regression,” Siberian Adv. Math. 29, 210 (2019).
Article MathSciNet Google Scholar
Yu. Yu. Linke and A. I. Sakhanenko, “Asymptotically normal estimation of a multidimensional parameter in the linear-fractional regression problem,” Sib. Mat. Zh. 42, 372 (2001) [Sib. Math. J. 42, 317 (2001)].
Article MathSciNet Google Scholar
Yu. Yu. Linke and A. I. Sakhanenko, “Asymptotically normal explicit estimation of parameters in the Michaelis–Menten equation,” Sib. Mat. Zh. 42, 610 (2001) [Sib. Math. J. 42, 517 (2001)].
Article Google Scholar
B. Mu , E.-W. Bai, W. X. Zheng, and Q. Zhu, “A globally consistent nonlinear least squares estimator for identification of nonlinear rational systems,” Automatica 77, 322.
Ch. H. Müller, “Asymptotic behavior of one-step $M $-estimators in contaminated nonlinear models,” in: Asymptotic Statistics (Physica-Verlag, Heidelberg, 1994), 395.
M. J. Panik, Regression Modeling: Methods, Theory, and Computation with SAS (Chapman and Hall/CRC, New York, 2009).
Book Google Scholar
B. M. Poetscher and I. R. Prucha, “A class of partially adaptive one-step $M $-estimators for the non-linear regression model with dependent observations,” J. Econometrics 32, 219 (1986).
Article MathSciNet Google Scholar
J. O. Rawlings, S. G. Pantula, and D. A. Dickey, Applied Regression Analysis: A Research Tool (Springer, Berlin, 2001).
Google Scholar
A. I. Sakhanenko, “On existence of explicit asymptotically normal estimators in nonlinear regression problems,” Analytical Methods in Statistics (AMISTAT) (Prague, 2015), Springer Proceed. in Math. and Statist., 193, 159 (2017).
E. N. Savinkina and A. I. Sakhanenko, “On improvement of statistical estimators in a power regression problem,” Sib. Electron. Mat. Izv. 16, 1901 (2019) [in Russian].
Article Google Scholar
J. Shults and J. M. Hilbe, Quasi-Least Squares Regression (Chapman and Hall/CRC, New York, 2014).
Book Google Scholar
C.G. Small, J. Wang, Numerical Methods for Nonlinear Estimating Equations (Clarendon press, Oxford, 2003).
Book Google Scholar
C. G. Small, J. Wang, and Z. Yang, “Eliminating multiple root problems in estimation,” Stat. Science 15, 313 (2000).
Article MathSciNet Google Scholar
C. G. Small and Z. Yang, “Multiple roots of estimating functions,” Canad. J. Statist. 27, 585 (1999).
Article MathSciNet Google Scholar
G. A. F. Seber and C. J. Wild, Nonlinear Regression (Wiley, Hoboken, NJ, 2003).
Google Scholar
A. W. Van der Vaart, Asymptotic Statistics (Cambridge University Press, Cambridge, UK, 2000).
Google Scholar
J. Wakefield, Bayesian and Frequentist Regression Methods (Springer, Berlin, 2013).
Book Google Scholar
D. S. Young, Handbook of Regression Methods (Chapman and Hall/CRC, New York, 2017).
Google Scholar

Download references

Funding

The work was carried out within the State Task for the Sobolev Institute of Mathematics (project no. FWNF-2022-0015).

Author information

Authors and Affiliations

Sobolev Institute of Mathematics, Novosibirsk, 630090, Russia
Yu. Yu. Linke & I. S. Borisov

Authors

Yu. Yu. Linke
View author publications
You can also search for this author in PubMed Google Scholar
I. S. Borisov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Yu. Yu. Linke or I. S. Borisov.

Rights and permissions

This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.

About this article

Cite this article

Linke, Y.Y., Borisov, I.S. An Approach to Constructing Explicit Estimators in Nonlinear Regression. Sib. Adv. Math. 33, 338–346 (2023). https://doi.org/10.1134/S1055134423040065

Download citation

Received: 25 June 2023
Revised: 15 July 2023
Accepted: 20 July 2023
Published: 14 December 2023
Issue Date: December 2023
DOI: https://doi.org/10.1134/S1055134423040065

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.