1. INTRODUCTION AND STATEMENT OF THE PROBLEM

We consider the following regression model:

$$ X_i=f_{\bf \theta }({\bf z}_i)+\xi _i, \qquad i=1,\ldots ,n,$$
(1)

where \(\{f_{\theta }({\bf t});\thinspace {{\mathbf \theta }}=(\theta _1,\ldots ,\theta _m)\in {\bf \Theta }\}\) is a parametric family of real-valued continuous functions of \(k \) variables, the \(k \)-dimensional vectors \({\bf t}=(t_1,\ldots ,t_k) \) belong to a Jordan-measurable compact set \({\cal P}\subset {\mathbb R}^k\) without isolated points, and \(\bf \Theta \) is an open set in \({\mathbb R}^m \). The collection of regressors \(\{{\bf z}_i;\thinspace i=1,\ldots ,n\}\) consists of observable random \(k \)-dimensional vectors with unknown (in general) distributions and with values in \(\cal P\), not necessarily independent or identically distributed. This collection of regressors may be considered as an array, i.e., the vectors \( \{{\bf z}_i; \; i=1,\ldots ,n\}\) may depend on \(n \). In particular, this scheme includes models with fixed regressors (for example, an equidistant design). The errors \(\{\xi _i;\thinspace i=1,\ldots ,n\}\) are centered random variables (additional conditions will be given at the end of this section).

The task is to construct explicit estimators for the unknown true values of the \(m \)-dimensional parameter \({{\mathbf \theta }}\in {\bf \Theta }\) by a sample of \((k+1) \)-dimensional observations \(\{(X_i, {\bf z}_i);\thinspace i=1,\ldots ,n\}\). Note that the problem of constructing explicit estimators in nonlinear regression models with certain restrictions on their parameters has already attracted attention of researchers (see [1, 3, 7, 8, 16, 19,20,21, 23, 25, 27, 28, 34, 36, 37]. First of all, we are talking about models in which estimation can be reduced to linear regression problems. Traditionally (see, for example, [7, 8, 25, 27]), intrinsically linear (or nonintrinsically nonlinear) models include regression models for which the regression equation can be reduced by one or another transformation of the responses \(\{X_i\}\) and the initial parameter \( {\mathbf \theta }\) to linear one. As a rule, we are talking about models with multiplicative noise (see, for example, [27]). In [20] the definition of intrinsically linear models was clarified and expanded and it was established that a number of well-known nonlinear regression models with additive errors satisfy this definition. For such models, explicit parameter estimators are constructed using linear regression analysis methods.

It is worth noting that the ability to transform the original regression model to a linear one (especially in the case of additive errors) is a rare exception rather than the rule. In [16] and [19], an approach was proposed that, under broad conditions, allows one to construct explicit consistent estimators of the parameters of nonlinear regression models that are not intrinsically linear. In the present paper, we further expand this class of nonlinear regression models.

It is important to note that, in addition to being of independent interest, constructing explicit consistent estimators for nonlinear regression models are of exceptional interest for one-step estimation. It is well known that, in nonlinear regression problems, asymptotically optimal estimators obtained, for example, by quasi-likelihood, least squares, and maximum likelihood methods, are often given implicitly in the form of solutions to certain equations (see, for example, monographs [3, 7, 9, 11, 25, 30, 34, 36, 37]). Together with that, the situation when there are several roots of one or another equation that determines an estimator, is very typical (see, for example, [31,32,33]). This circumstance is the main problem that complicates the use of numerical methods. The fact is that if the initial approximation of a parameter is unsuccessfully chosen, the iterative procedures detect only the root closest to the starting point, and not to the parameter. One way to work around this problem is to use so-called one-step estimators. The idea of one-step estimation, going back to R. Fisher’s works, is as follows: The starting point of an iterative Newton-type procedure is not an arbitrary point but a preliminary consistent estimator that converges to a parameter with a certain speed. It turns out that, in this case, only one step of the iterative procedure is often sufficient to obtain an explicit estimator (the so-called “one-step”, which has the same asymptotic accuracy as the required statistics (see, for example, [35]). In fact, if we have some preliminary consistent estimator then we get the opportunity, using Newton-type procedures, to isolate the one of the roots of the equation of interest to us, which approximates the parameter.

In recent years, interest in one-step estimation has only been growing in the statistical literature, and the bibliography in this area is very extensive (a number of bibliographic references can be found, for example, in [14]). The importance of developing a one-step estimation methodology specifically for nonlinear regression problems is emphasized, for example, in the monograph [31]. In various formulations of nonlinear regression problems, one-step estimates are studied, for example, in [4, 23, 24, 26, 29, 31], but the existence of preliminary estimators in these papers (with the exception of the article [23] devoted to fractional-linear models) is only postulated. In the papers [13,14,15] related to one-step estimation in nonlinear regression, when constructing preliminary estimators, the technique from [16] and [19] is used. The new estimators proposed in the present paper, can also be used as preliminary estimators in one-step estimation procedures.

Let us return to the conditions for Model (1). We additionally assume that the random errors \(\{\xi _i;\thinspace i=1,\ldots ,n\}\) form a sequence of martingale differences under the condition

$$ M_p=\sup _{i\leq n}{\mathbb E}|\xi _i|^p<\infty \quad \mbox {for some} \thinspace \thinspace p>k \thinspace \thinspace \mbox {and}\thinspace \thinspace p\ge 2,$$

where \(M_p \) does not depend on \(n \). It is also assumed that the random variables \(\{\xi _i\} \) do not depend on \(\{{\bf z}_i\} \) but may depend on \(n \). Next, we assume that with probability \(1 \) each observed point \({\bf z}_i \) from the set of regressors has a multiplicity of \(1 \) in the sample. Note that if in the sample \(\{{\bf z}_i; \thinspace i=1,\ldots ,n\}\) there were multiple points then we could consider the arithmetic mean of the responses \(X_i \) with identical regressors and, thereby, reduce the problem to the original one. It is important to note that in the presence of an increasing (with increasing \(n \)) multiplicity of one or another point from the set of regressors, the problem is reduced to a classical statistical formulation of the method of moments. Therefore, we exclude such cases from consideration. So, even in the presence of nonrandom regressors, we reject the possibility of “full control” over them when one can consider a growing number of responses with the same point from the set of regressors.

For each \(n\), we denote by \(\varepsilon _n \) the minimum possible size of the \(\varepsilon \)-net formed by a set of regressors \(\{{\bf z}_1,\ldots ,{\bf z}_n\} \) in the compact set \(\cal P \). So, the only restriction on the regressors will be the following condition:

  1. (D)

    \(\varepsilon _n \stackrel {p}{\to } 0\) as \(n\to \infty \).

Remark 1\(. \) If all \({\bf z}_i \) do not depend on \(n \) then convergence in probability in condition (D) will be equivalent to convergence almost surely due to the monotonicity of the sequence \(\{\varepsilon _n\}\). For example, if \(\{{\bf z}_i;\thinspace i=1,2,\ldots \}\) is a sequence of identically distributed random vectors (not necessarily stationary) under the strong mixing condition and the compact \(\cal P \) is the support of the marginal distribution then condition (D) will be satisfied. In [12] and [17], some examples of stronger correlation of regressors, when all known conditions of weak dependence are not met, but condition (D) holds.

2. METHODOLOGY FOR OBTAINING EXPLICIT ESTIMATORS

For simplicity of presentation, we further set \({\cal P}=[0,1]^k \). Denote by \(C[0,1]^{k} \) the space of continuous functions on \([0,1]^{k} \). Suppose that the space \(C[0,1]^{k} \) is endowed with some norm \(\|\cdot \| \). So, in what follows, we will consider the linear normed space \( (C[0,1]^{k},\|\cdot \|)\), which is assumed to be separable with respect to the metric generated by the norm \(\|\cdot \| \). We will be interested mainly in the following two cases:

$$ \|f\|\equiv \|f\|_{sup} =\sup _{{\bf t}\in [0,1]^{k}}|f({\bf t})|, \qquad \quad \|f\|\equiv \|f\|_{pw}=\sum \limits _{i=1}^{\infty }\frac {|f({\bf t}_i)|}{2^i}, $$

where the summation is taken over all points \({\bf t}_i\in [0,1]^{k}\) with rational coordinates numbered in an arbitrary manner. Note that convergence in the \(\|\cdot \|_{pw} \)-norm is equivalent to pointwise convergence of functions in \( C[0,1]^{k}\).

The proposed approach is based on the use of nonparametric kernel estimators of the regression function. We need the following assumptions:

  1. (I)

    There exists a continuous mapping \({\bf G}: (C[0,1]^{k}, \|\cdot \|)\to {\mathbb R}^m \) for which the vector-valued function \({\bf g}({{\mathbf \theta }})={\bf G}(f_{\bf \theta }) \) is a homeomorphism of the open set \(\bf \Theta \) to some open domain of \({\mathbb R}^m \);

  2. (II)

    There exists a \(\|\cdot \|\)-consistent \(( \)or a strongly \(\|\cdot \| \)-consistent \() \) nonparametric estimator \(f_n^*({\bf t})\in (C[0,1]^k, \|\cdot \|)\) for the regression function \(f_{{\theta }_0}({\bf t})\), where \({{\mathbf \theta }}_0 \) is the true value of the parameter in (1).

We now define an estimator by the formula

$$ {{\mathbf \theta }}_n^*={\bf g}^{-1}({\bf G}(f_n^*)),$$

where \({\bf g}^{-1} \) is the inverse transform for \(\bf g \); in the case where \({\bf G}(f_n^*)\notin \{{\bf g}({{\mathbf \theta }});\thinspace {{\mathbf \theta }}\in \bf \Theta \} \), we set \({\bf g}^{-1}({\bf G}(f_n^*))={\bf 0}\) by definition.

The following assertion is valid.

Theorem 1\(. \) If assumptions (I) and (II) are met then the estimator \({\bf \theta }_n^* \) will be consistent (strongly consistent in accordance with assumption (II)) for \({\bf \theta }_0 \).

Proof of this statement is quite clear. Indeed, say, the \( \|\cdot \|\)-consistency of a nonparametric estimator \( f_n^*({\bf t})\) means that

$$ \|f_n^*-f_{{\bf \theta }_0}\|\stackrel {p}{\to } 0\thinspace \thinspace \thinspace \mbox {as}\thinspace \thinspace \thinspace n\to \infty .$$

In other words, with probability tending to 1, due to continuity of the mapping \(\bf G \) in the norm \(\|\cdot \| \) of the space \(C[0,1]^k \), one has \({\bf G}(f_n^*)\in S_{g({\bf \theta _0})}(\varepsilon )\) for any given \(\varepsilon >0 \), where \(S_{g({\bf \theta _0})}(\varepsilon )\) is the open ball of radius \(\varepsilon \) with center at the point \({\bf g}({{\mathbf \theta }_0}) \) in \({\mathbb R}^m \). Since the image of an open set under a homeomorphic transformation is open, then there is an embedding \(S_{g({\bf \theta _0})}(\varepsilon )\subseteq \{{\bf g}({{\mathbf \theta }});\thinspace {{\mathbf \theta }}\in {\bf \Theta }\}\) for a sufficiently small \( \varepsilon >0\). It remains to use continuity of the inverse transformation \({\bf g}^{-1}\), from which it follows that \({{\mathbf \theta }}_n^*\stackrel {p}{\to } {{\mathbf \theta }}_0\) for \(n\to \infty \). Similar reasoning is used to prove strong \(\|\cdot \| \)-consistency. \(\quad \square \)

Remark 2\(. \) The technique proposed in Theorem 1 for constructing estimators of finite-dimensional parameters in nonlinear regression problems is close to the methodology of the method of moments. In fact, Theorem 1 suggests to equate the values of the regression function at certain points from its domain of definition to the corresponding values of its consistent nonparametric estimator. In this case, the number of such equations (or what is the same — the above indicated points) must coincide with the dimension of the regression function parameter. This is exactly what happens in the method of moments, when for constructing parameter estimators, the true moments (as a function of the parameter under consideration) are equated to the corresponding sample moments, which, in turn, will be consistent estimators for the true moments. So, by analogy with the method of moments, the approach proposed in Theorem 1 can be stated as follows. First we define the system of equations

$$ f_{\theta }({\bf t}_j) = f_n^*({\bf t}_j),\quad j=1,\ldots ,\dim {\mathbf \theta }=m.$$
(2)

The points \(\{{\bf t}_j\} \) are chosen so that this system would be uniquely solvable and the inverse mapping would be continuous. Here we do not plan to discuss the issue of optimal choice of points \(\{{\bf t}_j\}\).

Regarding assumption \(\rm (II)\), we consider examples of nonparametric kernel estimators that will be \(\|\cdot \|_{pw} \)-consistent if only condition (D) is satisfied. First of all, we define a smoothing kernel function \(K({\bf t})\), \({\bf t}\in \mathbb R^k\), as the density of a centrally symmetric distribution with support in \([-1,1]^k\). We will assume that the function \(K({\bf t})\) satisfies the Lipschitz condition everywhere on \(\mathbb R^k\). We also need the notation \(K_{h}({\bf t})=h^{-k} K(h^{-1}{\bf t}) \), \(h\in (0,1) \), that is a distribution density with support in \([-h,h]^k \).

Finally, we define an estimator for the function \(f_{\theta _0} \) in assumption \(\rm (II) \), by the formula

$$ f^*_{n,h}({\bf t})=\frac {\displaystyle \sum \nolimits _{i=1}^nX_{i}K_{h}({\bf t}-{\bf z}_{i})\Lambda _k({\cal P}_i)}{\displaystyle \sum \nolimits _{i=1}^nK_{h}({\bf t}-{\bf z}_{i})\Lambda _k({\cal P}_i)}, $$
(3)

where \(h\) is the so-called observation window width, which tends to zero at a certain rate as \(n \) grows, \(\Lambda _k(\cdot ) \) is the Lebesgue measure in \(\mathbb R^k \), the measurable subsets \(\{{\cal P}_i, i=1,\ldots ,n\} \) form a finite partition of the cube \([0,1]^k \), and each of these subsets contains exactly one regression point from the set \(\{{\bf z}_i\}\) and the maximum diameter of all partition elements \({\cal P}_i \) should tend to zero as the sample size \(n \) increases (if condition (D) is satisfied, such a partition obviously exists, see [17]). From a practical point of view, the specified partition with marked points can be organized, for example, by the method of successive coordinate-median sections or using a Voronoi mosaic (see details in [17]). A univariate version of this estimator in [2] is defined as follows:

$$ f^*_{n,h}(t)=\frac {\displaystyle \sum \nolimits _{i=1}^nX_{ni}K_{h}(t-z_{n:i})\Delta z_{ni}}{\displaystyle \sum \nolimits _{i=1}^nK_{h}(t-z_{n:i})\Delta z_{ni}},$$
(4)

where \(z_{n:0}=0 \), \(z_{n:1}\leq \ldots \leq z_{n:n}\) are the order statistics obtained from the sample \( \{z_i;\thinspace i =1,\ldots , n\}\), \(\Delta z_{ni}=z_{n:i}-z_{n:i-1}\), \(i=1,\ldots ,n \); the random variables \(X_{ni} \) are the responses from the regression equation (1) associated with the order statistics \(z_{n:i} \), respectively.

Let us also introduce into consideration a classical Nadaraya–Watson estimator by the formula

$$ \widehat f_{n,h}({\bf t})=\frac {\displaystyle \sum \nolimits _{i=1}^nX_{i}K_{h}({\bf t}-{\bf z}_{i})}{\displaystyle \sum \nolimits _{i=1}^nK_{h}({\bf t}-{\bf z}_{i})}.$$
(5)

It is known (see [2, 12, 17, 18]) that only under condition (D) there are sequences \(h\equiv h_n\to 0\) for which the above three estimators are \(\|\cdot \|_{pw}\)-consistent, and the estimators (3) and (4) are also \(\|\cdot \|_{sup}\) -consistent. Further, by default, it is assumed that when using kernel estimators (3) and (5), the window width \( h\equiv h_n\) tends to zero at a known rate as \(n\to \infty \). In Section 3 of this paper, formula (11) determines the optimal window width for the kernel estimator (3) (see also Remark 5 below).

Remark 3\(. \) Note that the following relations hold:

$$ \begin {gathered} {f^*_{n,h}}({\bf t})={\rm arg}\min \limits _{a} \sum \limits ^n_{i=1}(X_{i}-a)^2K_{h}({\bf t}-{\bf z}_{i})\Lambda ({\cal P}_i),\\ {\widehat f_{n,h}}({\bf t})={\rm arg}\min \limits _{a} \sum \limits ^n_{i=1}(X_{i}-a)^2K_{h}({\bf t}-{\bf z}_{i}); \end {gathered} $$

i.e., the kernel estimator \(f^*_{n,h}(t) \) (as well as the classical Nadaraya-Watson estimator \( \widehat f_{n,h}(t)\)) is an estimator of the weighted least squares method and belongs to the class of local constant estimators, but with some different weights than those in the construction of Nadaraya-Watson estimators.

Let us give examples of estimators constructed by the use of the above theorem.

Example 1\(. \) We consider Model (1) with the regression function

$$f_{\bf \theta }({\bf t})={\theta }_1t_1^{{\theta }_2}t_2^{{\theta }_3}, $$

where \({\bf t}=(t_1,t_2)\in {\mathbb R}^2_+ \) and \({{\mathbf \theta }}=(\theta _1,\theta _2,\theta _3)\in {\mathbb R}^3_+\) are vectors with positive coordinates. This is the so-called Cobb-Douglas model, quite popular in econometrics (see, for example, [10]). Consider a continuous mapping from \((C[0,1]^{2}, \|\cdot \|_{pw}) \) to \({\mathbb R}^3_+ \) defined by the formula

$$ G(f)=(f(2^{-1},2^{-1}),f(2^{-1},3^{-1}),f(3^{-1},3^{-1})). $$

We now show that the superposition \({\bf g}({\bf \theta })\equiv G(f_{\bf \theta })\) is a homeomorphism from \({\mathbb R}^3_+\) to the conic domain \(C^3_+=\{(r_1,r_2,r_3)\in {\mathbb R}^3_+:\thinspace 0<r_3<r_2<r_1\}\). It is enough to prove that \({\bf g}({{\mathbf \theta }}) \) is a bijection since continuity of this mapping is obvious. Indeed, consider the system of equations

$$ \begin {cases} f_{\bf \theta }(2^{-1},2^{-1})\equiv {\theta }_12^{-{\theta }_2}2^{-{\theta }_3}=r_1, \\ f_{\bf \theta }(2^{-1},3^{-1})\equiv {\theta }_12^{-{\theta }_2}3^{-{\theta }_3}=r_2, \\ f_{\bf \theta }(3^{-1},3^{-1})\equiv {\theta }_13^{-{\theta }_2}3^{-{\theta }_3}=r_3, \end {cases}$$

where \((r_1,r_2,r_3) \) is an arbitrary point from \(C^3_+ \). Taking logarithms of this system of equations, we reduce it to the equivalent form

$$ \begin {cases} \widetilde \theta _1-\theta _2\log 2-\theta _3\log 2=s_1, \\ \widetilde \theta _1-\theta _2\log 2-\theta _3\log 3=s_2, \\ \widetilde \theta _1-\theta _2\log 3-\theta _3\log 3=s_3, \end {cases} $$

where \(\widetilde \theta _1=\log {\theta _1} \), \(s_j=\log r_j \), \(j=1,2,3 \). It is easy to check that the matrix \(\mathbb A \) of this system of linear equations is nonsingular. Its only solution is easily found by sequentially eliminating variables:

$$ \widetilde \theta ^*_{n1}=\frac {s_1\log 3-s_3\log 2}{\log (3/2)},\quad \theta ^*_{n2}=\frac {s_1-s_2}{\log (3/2)},\quad \theta ^*_{n3}=\frac {s_2-s_3}{\log (3/2)}.$$
(6)

Now for our purposes we can use, for example, the kernel estimator \(f^*_{n,h}({\bf t})\) defined in (3). Put in (6)

$$ s_1=\log f^*_{n,h}(2^{-1},2^{-1}),\quad s_2=\log f^*_{n,h}(2^{-1},3^{-1}),\quad s_3=\log f^*_{n,h}(3^{-1},3^{-1}).$$

In view of the above, on the set of elementary outcomes of asymptotically full measure, for all sufficiently large \(n \), the double inequality \(s_1>s_2>s_3 \) holds, i.e., the three-dimensional estimator \((\widetilde \theta _{n1}^*,\theta _{n2}^*,\theta _{n3}^*) \) is well defined, and by Theorem 1, it will be consistent for the three-dimensional parameter \((\log {\theta _1},{\theta _2},{\theta _3}) \). So the estimator \({\mathbf \theta }_n^*=(\exp \{\widetilde \theta _{n1}^*\},\theta _{n2}^*,\theta _{n3}^*) \) will be consistent for the initial parameter \({\mathbf \theta }=({\theta _1},{\theta _2},{\theta _3}) \).

Example 2\(. \) We consider Model (1) with the regression function

$$f_{\bf \theta }(t)=\frac {{\theta }_1t}{t+{\theta }_2},$$

where \(t>0 \) and \({{\mathbf \theta }}=(\theta _1,\theta _2)\in {\mathbb R}^2_+\). This is the so-called Michaelis-Menten model, well known in biochemistry (see, for example, [5, 6]). Consider a continuous mapping from \((C[0,1], \|\cdot \|_{pw}) \) to \({\mathbb R}^2_+ \)

$$ {\bf G}(f)=(f(1),f(1/2)). $$

Now we show that the superposition \({\bf g}({{\mathbf \theta }})\equiv {\bf G}(f_{\bf \theta })\) is a homeomorphism of the open positive quadrant \({\mathbb R}^2_+\) to the open cone \( C^2_+=\{(r_1,r_2);\thinspace r_2>0,\thinspace r_2<r_1<2r_2\} \). It is enough to prove that \({\bf g}({{\mathbf \theta }}) \) is a bijection. Indeed, consider the system of equation

$$ \begin {cases} \frac {\theta _1}{1+\theta _2}= r_1, \\ \frac {2^{-1}\theta _1}{2^{-1}+\theta _2}=r_2 \end {cases} $$

for any vector \((r_1,r_2)\in C^2_+ \). This system obviously reduces to the following system of two linear equations in two unknowns \(\theta _1 \) and \(\theta _2 \):

$$ \begin {cases} \theta _1-r_1\theta _2=r_1, \\ \theta _1-2r_2\theta _2=r_2, \end {cases}$$

whose matrix is nonsingular everywhere in the above open cone. As a result, with a known vector \((r_1,r_2) \), we obtain

$$ \theta ^*_{n1}=r_1\left (1+\frac {r_1-r_2}{2r_2-r_1}\right ),\quad \theta ^*_{n2}=\frac {r_1-r_2}{2r_2-r_1}. $$
(7)

Obviously, the constructed one-to-one mapping is bilaterally continuous, i.e., is a homeomorphism of the above open domains.

Now we can consider the kernel estimator \(f^*_{n,h}({t}) \) defined in (4), or the Nadaraya–Watson estimator \(\widehat f_{n,h}({ t }) \) defined in (5). As has already been proven in [2] and [18], if condition (D) is satisfied then both of these estimators will be consistent in the norm \(\|\cdot \|_{pw}\) for some \(h\equiv h_n\to 0\). So, for example, you can put in (7)

$$ r_1=f^*_{n,h}({t})({1}),\quad r_2=f^*_{n,h}({t})({1/2}). $$

In this case, the two-dimensional estimator \({\mathbf \theta }_{n}^*=(\theta ^*_{n1},\theta ^*_{n2}) \) in (7) is well defined on a set of elementary outcomes of asymptotically full measure as \(n\to \infty \), and moreover, is consistent by virtue of Theorem 1.

Remark 4\(. \) Note that explicit estimators for the parameters of the Michaelis–Menten model were known earlier (see [22]). In particular, the estimators in [22] are constructed, in essence, due to the intrinsical linearity of this model, and fulfillment of conditions like (D) in [22] is not required. However, the fundamental difference between this work and the results of the present paper is that the regressors \(\{{\bf z}_i\} \) in [22] are nonrandom and the random errors \(\{\xi _i\} \) are independent.

Example 3\(. \) We consider the so-called logistic regression. In this case,

$$f_{\theta }({\bf t})=\left (1+ e^{-({\bf t},{\mathbf \theta })}\right )^{-1}, $$

where \(\dim {\bf t}=\dim {{\mathbf \theta }}=m \) and \((\cdot ,\cdot ) \) is the standard Euclidean inner product in \({\mathbb R}^m \).

Next, for an arbitrary set of points \(\{{\bf t}_j; \thinspace j=1,\ldots ,m\}\) from the unit \(m \)-dimensional cube, the system of equations (2) is easily reduced to the following system of linear equations:

$$ \begin {cases} ({\bf t}_1,{\mathbf \theta })=r_1, \\ \ldots \ldots \ldots \ldots \\ ({\bf t}_m,{\mathbf \theta })=r_m. \end {cases} $$

The only restriction on the vectors \(\{{\bf t}_j\} \) is their linear independence, i.e., nondegeneracy of the matrix \( \mathbb T\) of the reduced system. Now, let us put

$$r_j=\log \frac {f_n^*({\bf t}_j)}{1-f_n^*({\bf t}_j)},$$

where \(f_n^*({\bf t}) \) is any of the estimators (3) or (5). Note that due to the \( \|\cdot \|_{pw}\)-consistency of the indicated estimators under condition (D) only, the inequality \(f_n^*({\bf t}_j)<1 \) holds with probability tending to 1 as \(n\to \infty \) for any fixed \({\bf t}_j \). If this inequality is violated, we assume \(r_j=0 \). We finally obtain the following construction of a consistent estimator for the logistic model:

$$ {{\mathbf \theta }}^*_n={\mathbb T}^{-1} (r_1,\ldots ,r_m)^{\top }.$$
(8)

3. ANALYSIS OF \(\alpha _n\) -CONSISTENCY OF THE ESTIMATORS

Now we will discuss the question of clarifying the results of Theorem 1. We recall that the estimator \( {{\mathbf \theta }}^*_n\) of the parameter \({\mathbf \theta }\in {\mathbb R}^m\) is called \(\alpha _n \)-consistent if \(\alpha _n({{\mathbf \theta }}^*_n-{{\mathbf \theta }})\stackrel {p}{\to } {\bf 0} \) and \(\alpha _n\to \infty \) as \(n\to \infty \). This definition can be easily transferred to the case of infinite-dimensional parameters.

Definition\(. \) Let \(({\cal C}, d) \) be a separable metric space. A sequence of random elements \( g_n^*\in {\cal C}\) is called an \(\alpha _n \)-consistent estimator of a random element \(g\in {\cal C} \) if \(\alpha _nd(g_n^*,g)\stackrel {p}{\to } 0\) and \(\alpha _n\to \infty \) as \(n\to \infty \).

We now present a refinement of Theorem 1 under appropriate detailing of the restrictions.

Theorem 2\(. \) Under the conditions of Theorem 1, let the mappings \(\bf G\) and \( {\bf g}^{-1}\) satisfy Lipschitz condition and there is a nonparametric \(\alpha _n \)-consistent \(( \)in the norm of the space (\(C[0,1]^{k}, \|\cdot \|) \)) estimator \(f^*_n \) for the unknown regression function \(f_{\theta _0} \). Then the estimator \( {{\mathbf \theta }}_n^*\) will be \(\alpha _n\)-consistent for the true parameter \({{\mathbf \theta }}_0 \).

Proof immediately follows from the estimator’s construction \({{\mathbf \theta }}_n^*={\bf g}^{-1}({\bf G}(f_n^*)) \) in Theorem 1 and the above definition of \(\alpha _n \)-consistent nonparametric estimator \(f^*_n \). \(\quad \square \)

As is easy to see, the main condition in this theorem is the existence of an \(\alpha _n \)-consistent nonparametric estimator \(f^*_n \). The following statement gives an example of such a nonparametric estimator.

Theorem 3\(. \) For each \( {{\mathbf \theta }}\in {\bf \Theta } \) in Model (1), let the regression function \(f_{\theta }({\bf t}) \) satisfy the Lipschitz condition and do not coincide everywhere with a constant function. Then, under condition (D), the kernel estimator (3) will be \( \alpha _n\)-consistent for \( \alpha _n=o(h_n^{-1})\), where

$$ h_n=\left ({\mathbb E}(\varepsilon _n^{kp/2})\right )^{\frac {1}{p(k/2+1)+k}}.$$

Proof. In [17], it is shown that, under the conditions of the model under consideration, for the kernel estimator (3) the following inequality is valid with probability 1:

$$ \|f^*_{n,h}-f_{\bf \theta _0}\|_{sup}\le \omega _{f_{\theta _0}}(h)+\zeta _n(h),$$
(9)

where \(\omega _{f_{\theta _0}}(h)\) is the modulus of continuity (strongly positive for each \(h>0\)) of the true regression function, and the random variable \(\zeta _n(h)\) has the following order in probability:

$$ \zeta _n(h)=O_p\left (\left (h^{-k(p/2+1)}\thinspace {\mathbb E}(\varepsilon _n^{kp/2})\right )^{1/p}\right ); $$

here the upper limit in the definition of the symbol \(O_p(\cdot ) \) depends on \(m \), \(p \), \(M_p \), and the kernel \(K \). The optimal window width \(h\equiv h_n \), which equalizes the orders of both terms on the right-hand side of (9), is found as a solution \(h\equiv h_n \) to the equation

$$ {\mathbb E}(\varepsilon _n^{kp/2})=h^{k(p/2+1)}\omega ^p_{f_{\theta _0}}(h).$$
(10)

For the Lipschitz regression function, in this case we obtain a representation for the optimal (in order of smallness) window width:

$$ h_n=\left ({\mathbb E}(\varepsilon _n^{kp/2})\right )^{\frac {1}{p(k/2+1)+k}},$$
(11)

which is what needed to be shown. \(\quad \square \)

Thus, if we put \(\alpha _n=o(h_n^{-1}) \) then in all the above examples, when substituting the nonparametric kernel estimator \(f^*_{n,h_n}\), using Theorems 2 and 3 we obtain \(\alpha _n\)-consistent estimators for multidimensional parameters of the nonlinear regression models under consideration. For example, for the estimator (8), this statement follows from the fact that the function \(\phi (z)=\log \frac {z}{1-z} \) satisfies the Lipschitz condition in a neighborhood of each point of the open interval \((0,1)\), and, in addition, by virtue of (8), on a set of elementary outcomes of asymptotically full measure the inequality

$$ \|{\mathbf \theta }_n^*-{\mathbf \theta }_0\|\le C\max _{j\le m}|f_n^*({\bf t}_j)-f_{\theta _0}({\bf t}_j)| $$
(12)

holds for all sufficiently large \(n \), where the nonrandom constant \( C \) depends on the collection \(\{{\bf t}_j\} \) and \(m \).

Remark 5\(. \) If under condition (D) the minimal radius of an \(\varepsilon \)-net admits a deterministic upper bound \(\varepsilon _n\le \hat \varepsilon _n\), then formula (11) for window width can be transformed as

$$ h_n\leq \widehat h_n\equiv \hat \varepsilon _n^{\frac {k}{k+2+2/p}}. $$

Moreover, for a sufficiently large \(p \) (for example, when the noise is Gaussian), the order of smallness of the window width can be made arbitrarily close to the value \(\hat \varepsilon _n^{{k}/(k+2)}\). As an example, we can consider a one-dimensional (\(k=1\)) equidistant design when in the estimator (4) we should put \(z_{n:i}=i/n \) and \(\hat \varepsilon _n=\Delta z_{ni}=1/n\). In this case, \(\hat h_n=\hat \varepsilon _n^{{1}/{3}}=n^{-{1}/{3}}\). It is also worth noting that in the case of independent identically distributed random regressors \(\{{\bf z}_i; \; i=1,\ldots ,n\}\) in Model (1) it is not difficult to establish asymptotic normality of the estimators considered above using known results on asymptotic normality of the kernel estimators (3)–(5). For example, for \(k=1 \), the normalization of the residual \(|f_n^*({\bf t}_j)-f_{\theta _0}({\bf t}_j)|\) in broad conditions will have order \(\sqrt {n h_n}\) for some sequence \(h_n \) (window width) tending to zero with a power-law rate in \(n \), which is an upper bound (unattainable) for the normalizing factor \(\alpha _n\) in the definition of \( \alpha _n\)-consistency.

Remark 6\(. \) The approach proposed here for constructing \(\alpha _n \)-consistent estimators of nonlinear regression parameters is somewhat inferior in accuracy to the estimators in [16] and [19], where another method was proposed for constructing \(\alpha _n \)-consistent and asymptotically normal estimators without using nonparametric kernel estimators. Under broad conditions, the value of \(\alpha _n \) in the mentioned papers can be arbitrarily close to \(\sqrt n \), while using the estimator (12) it is possible to obtain \(\alpha _n \)-consistent estimators with the order of growth of \(\alpha _n \) no more than \(n^{1/3} \) (see Remark 5). However, explicit estimators obtained by the approach of Theorem 1, in many models look much simpler than the estimators proposed in [16] and [19] in the same models. In addition, it is important to note that with the help of one-step procedures the accuracy of certain estimators can be significantly improved, while being preliminary \(n^{1/4}\)-consistent estimators can be used under a wide range of conditions (see, for example, [13,14,15]). So the estimators proposed in the present paper can be used as preliminary (initial) for multistep Newton-type procedures.