Skip to main content
Log in

Two-stage estimation and simultaneous confidence band in partially nonlinear additive model

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

In this paper, we focus on the estimation and inference in partially nonlinear additive model on which few research was conducted to our best knowledge. By integrating spline approximation and local smoothing, we propose a two-stage estimating approach in which the profile nonlinear least square method was used to estimate parameters and additive functions. Under some regular conditions, we establish the asymptotic normality of parametric estimators and achieve an optimal nonparametric convergence rate of the fitted functions. Furthermore, the spline-backfitted local linear estimator is proposed for the additive functions and the corresponding asymptotic distribution is also established. To make inference on the nonparametric functions from the whole, we construct the theoretical simultaneous confidence bands, and further propose an empirical bootstrap-based confidence band for the heavy computing burden in implement. Finally, both Monte Carlo simulation and real data analysis show the good performance of our proposed methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Bates DM, Watts DG (1988) Nonlinear regression analysis and its applications. Wiley, New York

    Book  MATH  Google Scholar 

  • Biedermann S, Dette H, Woods DC (2011) Optimal design for additive partially nonlinear models. Biometrika 98(2):449–458

    Article  MathSciNet  MATH  Google Scholar 

  • Bowman AW (1984) An alternative method of cross-validation for the smoothing of density estimates. Biometrika 71(2):353–360

    Article  MathSciNet  Google Scholar 

  • Breiman L, Friedman JH (1985) Estimating optimal transformations for multiple regression and correlation. J Am Stat Assoc 80(391):580–598

    Article  MathSciNet  MATH  Google Scholar 

  • Cai Z, Xu X (2008) Nonparametric quantile estimations for dynamic smooth coefficient models. J Am Stat Assoc 103(484):1595–1608

    Article  MathSciNet  MATH  Google Scholar 

  • Claeskens G, Van Keilegom I (2003) Bootstrap confidence bands for regression curves and their derivatives. Ann Stat 31(6):1852–1884

    Article  MathSciNet  MATH  Google Scholar 

  • Currie DJ (1982) Estimating michaelis-menten parameters: bias, variance and experimental design. Biometrics 38(4):907–919

    Article  MATH  Google Scholar 

  • De Boor C (2001) A practical guide to splines. Appl Math Sci

  • Donthi R, Prasad SV, Mahaboob B, Praveen JP, Venkateswarlu B (2019) Estimation methods of nonlinear regression models. In: AIP conference proceedings, 2177(1), 020081. AIP Publishing

  • Fan J, Gijbels I (1996) Local polynomial modelling and its applications. Chapman & Hall, London

    MATH  Google Scholar 

  • Fan J, Härdle W, Mammen E et al (1998) Direct estimation of low-dimensional components in additive models. Ann Stat 26(3):943–971

    Article  MathSciNet  MATH  Google Scholar 

  • Fan J, Zhang W (2000) Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scand J Stat 27(4):715–731

    Article  MathSciNet  MATH  Google Scholar 

  • Härdle W, Liang H, Gao J (2012) Partially linear models. Springer Science & Business Media

  • Härdle W, Sperlich S, Spokoiny V (2001) Structural tests in additive regression. J Am Stat Assoc 96(456):1333–1347

    Article  MathSciNet  MATH  Google Scholar 

  • Harrison D Jr, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manage 5(1):81–102

    Article  MATH  Google Scholar 

  • Hart JD, Wehrly TE (1993) Consistency of cross-validation when the data are curves. Stochast Process Appl 45(2):351–361

    Article  MathSciNet  MATH  Google Scholar 

  • Huang L-S, Yu C-H (2019) Classical backfitting for smooth-backfitting additive models. J Comput Graph Stat 28(2):386–400

    Article  MathSciNet  Google Scholar 

  • Imhof LA et al (2001) Maximin designs for exponential growth models and heteroscedastic polynomial models. Ann Stat 29(2):561–576

    Article  MathSciNet  MATH  Google Scholar 

  • Jiang Y, Tian G-L, Fei Y (2019) A robust and efficient estimation method for partially nonlinear models via a new mm algorithm. Stat Pap 60(6):2063–2085

    Article  MathSciNet  MATH  Google Scholar 

  • Kong E, Xia Y (2012) A single-index quantile regression model and its estimation. Econom Theory 28(4):730–768

    Article  MathSciNet  MATH  Google Scholar 

  • Li G, Peng H, Tong T (2013) Simultaneous confidence band for nonparametric fixed effects panel data models. Econ Lett 119(3):229–232

    Article  MathSciNet  MATH  Google Scholar 

  • Li Q (2000) Efficient estimation of additive partially linear models. Int Econ Rev 41(4):1073–1092

    Article  MathSciNet  Google Scholar 

  • Li Q, Racine JS (2007) Nonparametric econometrics: theory and practice. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Li R, Nie L (2007) A new estimation procedure for a partially nonlinear model via a mixed-effects approach. Can J Stat 35(3):399–411

    Article  MathSciNet  MATH  Google Scholar 

  • Li R, Nie L (2008) Efficient statistical inference procedures for partially nonlinear models and their applications. Biometrics 64(3):904–911

    Article  MathSciNet  MATH  Google Scholar 

  • Li Y, Ruppert D (2008) On the asymptotics of penalized splines. Biometrika 95(2):415–436

    Article  MathSciNet  MATH  Google Scholar 

  • Liang H, Thurston SW, Ruppert D, Apanasovich T, Hauser R (2008) Additive partial linear models with measurement errors. Biometrika 95(3):667–678

    Article  MathSciNet  MATH  Google Scholar 

  • Liu X, Wang L, Liang H (2011) Estimation and variable selection for semiparametric additive partial linear models. Statistica Sinica 21(3):1225–1248

    Article  MathSciNet  MATH  Google Scholar 

  • Ma S, Lian H, Liang H, Carroll R (2017) SiAM: a hybrid of single index models and additive models. Electron J Stat 11(1):2397–2423

    Article  MathSciNet  MATH  Google Scholar 

  • Ma S, Yang L (2011) Spline-backfitted kernel smoothing of partially linear additive model. J Stat Plan Inf 141(1):204–219

    Article  MathSciNet  MATH  Google Scholar 

  • Mammen E, Linton O, Nielsen JP (1999) The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Ann Stat 27(5):1443–1490

    Article  MathSciNet  MATH  Google Scholar 

  • Manzan S, Zerom D (2005) Kernel estimation of a partially linear additive model. Stat Probab Lett 72(4):313–322

    Article  MathSciNet  MATH  Google Scholar 

  • Nielsen JP, Sperlich S (2005) Smooth backfitting in practice. J Roy Stat Soc B 67(1):43–61

    Article  MathSciNet  MATH  Google Scholar 

  • Riazoshams H, Midi H, Ghilagaber G (2018) Robust nonlinear regression: with applications using R. Wiley, Hoboken

    Book  MATH  Google Scholar 

  • Rice JA, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J Roy Stat Soc B 53(1):233–243

    MathSciNet  MATH  Google Scholar 

  • Rudemo M (1982) Empirical choice of histograms and kernel density estimators. Scand J Stat 9(2):65–78

    MathSciNet  MATH  Google Scholar 

  • Schumaker LL (1981) Spline functions: basic theory. Wiley, New York

    MATH  Google Scholar 

  • Seber GA, Wild CJ (2003) Nonlinear regression. Wiley-Interscience, Hoboken

    MATH  Google Scholar 

  • Severini TA, Wong WH et al (1992) Profile likelihood and conditionally parametric models. Ann Stat 20(4):1768–1802

    Article  MathSciNet  MATH  Google Scholar 

  • Silverman BW (1986) Density estimation for statistics and data analysis. Chapman and Hall, London

    MATH  Google Scholar 

  • Song L, Zhao Y, Wang X (2010) Sieve least squares estimation for partially nonlinear models. Stat Probab Lett 80(17–18):1271–1283

    Article  MathSciNet  MATH  Google Scholar 

  • Stone CJ et al (1984) An asymptotically optimal window selection rule for kernel density estimates. Ann Stat 12(4):1285–1297

    Article  MathSciNet  MATH  Google Scholar 

  • Su L, Ullah A (2006) Profile likelihood estimation of partially linear panel data models with fixed effects. Econ Lett 92(1):75–81

    Article  MathSciNet  MATH  Google Scholar 

  • Tjøstheim D, Auestad BH (1994) Nonparametric identification of nonlinear time series: projections. J Am Stat Assoc 89(428):1398–1409

    MathSciNet  MATH  Google Scholar 

  • Wang J, Yang L (2009) Efficient and fast spline-backfitted kernel smoothing of additive models. Ann Inst Stat Math 61(3):663–690

    Article  MathSciNet  MATH  Google Scholar 

  • Wang Z, Xue L, Liu J (2019) Checking nonparametric component for partially nonlinear model with missing response. Stat Probab Lett 149:1–8

    Article  MathSciNet  MATH  Google Scholar 

  • Wu TZ, Yu K, Yu Y (2010) Single-index quantile regression. J Multivar Anal 101(7):1607–1621

    Article  MathSciNet  MATH  Google Scholar 

  • Xiao Y, Tian Z, Li F (2014) Empirical likelihood-based inference for parameter and nonparametric function in partially nonlinear models. J Korean Stat Soc 43(3):367–379

    Article  MathSciNet  MATH  Google Scholar 

  • Xie H, Huang J et al (2009) Scad-penalized regression in high-dimensional partially linear models. Ann Stat 37(2):673–696

    Article  MathSciNet  MATH  Google Scholar 

  • Yang L, Park BU, Xue L, Härdle W (2006) Estimation and testing for varying coefficients in additive models with marginal integration. J Am Stat Assoc 101(475):1212–1227

    Article  MathSciNet  MATH  Google Scholar 

  • Yang L, Sperlich S, Härdle W (2003) Derivative estimation and testing in generalized additive models. J Stat Plan Inf 115(2):521–542

    Article  MathSciNet  MATH  Google Scholar 

  • Yu K, Lu Z (2004) Local linear additive quantile regression. Scand J Stat 31(3):333–346

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106(495):1099–1112

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang Y, Lian H, Yu Y (2017) Estimation and variable selection for quantile partially linear single-index models. J Multivar Anal 162:215–234

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou S, Shen X, Wolfe D (1998) Local asymptotics for regression splines and confidence regions. Ann Stat 26(5):1760–1782

    MathSciNet  MATH  Google Scholar 

  • Zhou X, Zhao P, Liu Z (2016) Estimation and inference for additive partially nonlinear models. J Korean Stat Soc 45(4):491–504

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

Li’s research was supported by the grant from the National Social Science Fund of China (No. 17BTJ025) and the Open Research Fund of Key Laboratory of Advanced Theory and Application in Statistics and Data Science (East China Normal University), Ministry of Education (No. KLATASDS1802).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rui Li.

Ethics declarations

Conflict of interest

The authors state that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

To start with, we review some properties of B-spline function. Let \(\mathbf{B}(u)=(B_1(u),\ldots ,B_L(u))^\top \) be B-spline basis over [0, 1], then \(B_l(u)\ge 0\) and \(\sum _{l=1}^{L}B_l(u)=\sqrt{L}\) for each \(u\in [0,1]\). Moreover, for any vector \(\varvec{\theta }=(\theta _1,\ldots ,\theta _L)^\top \) and constants \(0<C_1<C_2\),

$$\begin{aligned} C_1\Vert \varvec{\theta }\Vert ^2\le \int \left\{ \sum _{l=1}^{L}\varvec{\theta }^\top \mathbf{B}(u)\right\} ^2du \le C_2\Vert \varvec{\theta }\Vert ^2. \end{aligned}$$

Lemma 1

If a function \(\phi (u)\) defined on the support \(\mathcal {U}\) has r-order continuous derivatives with \(r\ge 2\), there exists \(\varvec{\gamma }=(\gamma _1,\ldots ,\gamma _L)^\top \) such that

$$\begin{aligned} \sup _{u\in \mathcal {U}}|\phi (u)-\mathbf{B}^\top (u)\varvec{\gamma }|=O(L^{-r}). \end{aligned}$$

Proof

Lemma 1 can be proved directly from the Theorem XII.1 in De Boor (2001).

\(\square \)

Lemma 2

If the conditions (B1)–(B2) are satisfied, it holds

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}K_h(u_{ik}-u_0) \begin{pmatrix} 1 &{} u_{ik}-u_0\\ (u_{ik}-u_0)/h^2 &{} (u_{ik}-u_0)^2/h^2 \end{pmatrix} =\mathbf{Q}+o_p(1), \end{aligned}$$

where

$$\begin{aligned} \mathbf{Q}=\begin{pmatrix} f(u_0) &{} 0\\ \kappa _2f'(u_0)&{}\kappa _2f(u_0) \end{pmatrix}, \end{aligned}$$

moreover, it follows that

$$\begin{aligned} \mathbf{Q}^{-1}= \begin{pmatrix} 1/f(u_0) &{} 0\\ -f'(u_0)/f^2(u_0)&{}1/(\kappa _2f(u_0)) \end{pmatrix}. \end{aligned}$$

Proof

The exercise 2.7 in Li and Racine (2007) shows this result, so we omit the proof. \(\square \)

Lemma 3

Suppose the conditions (B1)–(B2) hold and denote

$$\begin{aligned} m(u_{ik},u_0)=(1,0)(\mathbf{D}^{\top }(u_0)\mathbf{W}(u_0)\mathbf{D}(u_0))^{-1}\mathbf{D}_i(u_0)K_h(u_{ik}-u_0) \end{aligned}$$

with \(\mathbf{D}_i(u_0)\) being the ith column of \(\mathbf{D}^{\top }(u_0)\), we get

\((1)\, m(u_{ik},u_0)=n^{-1}K_h(u_{ik}-u_0)f^{-1}(u_0)\{1+o_p(1)\};\)

\((2)\, \lim _{n\rightarrow \infty }P_n\{\underset{u_0\in [0,1]}{\sup }\underset{1\le i\le n}{\max }| m(u_{ik},u_0)|\le C(nh)^{-1}\}=1\).

Proof

The conclusions can be derived by referring to the Lemma 4.1 in Su and Ullah (2006). \(\square \)

Proof of Theorem 1

Let \(\varvec{\gamma }_{0}=(\varvec{\gamma }^\top _{01},\ldots ,\varvec{\gamma }^\top _{0p})^{\top }\) be the true spline coefficient of \(\alpha _{0k}(\cdot )\) for \(k=1,2,\ldots ,p\), and denote \(R_{k}(u_{k})=\alpha _{0k}(u_{k})-\mathbf{B}^{\top }(u_{k})\varvec{\gamma }_{0k}\),

$$\begin{aligned} \mathbf{R}(u)=(R_1(u_1),R_{2}(u_2),\ldots ,R_{p}(u_p))^{\top }\, \mathrm {and} \, \mathbf{R}_{p\times n}=(\mathbf{R}(u_1),\mathbf{R}(u_{2}),\ldots ,\mathbf{R}(u_n)). \end{aligned}$$

Following the assumptions (A3)-(A5) and the corollary 6.21 in Schumaker (1981), we get \(\Vert \mathbf{B}(u)\Vert =O(\sqrt{K})\) and \(\Vert R_{k}(u)\Vert =O(L^{-r})\) with r being defined in the condition (A4).

To prove the \(\sqrt{n}\)-consistency of \({{\widehat{\varvec{\beta }}}}\), it suffices to show that for any \(\zeta >0\), there exists a sufficiently large constant \(C>0\) such that

$$\begin{aligned} P\{ \underset{\Vert \mathbf{v}\Vert =C}{\inf }Q(\varvec{\beta }_0+n^{-1/2}\mathbf{v})>Q(\varvec{\beta }_0) \}\ge 1-\zeta , \end{aligned}$$
(6.13)

where \(\mathbf{v}\) is m-dimensional constant vector. Let \(Q'(\cdot )\) and \(Q''(\cdot )\) be the first and the second order derivations of \(Q(\cdot )\) respectively, and take the Taylor expansion of \(Q(\cdot )\) at \(\varvec{\beta }_0\) , we get

$$\begin{aligned} Q(\varvec{\beta }_0+n^{-1/2}\mathbf{v})-Q(\varvec{\beta }_0) =n^{-1/2}Q'(\varvec{\beta }_0)^{\top }\mathbf{v}+\frac{1}{2n}\mathbf{v}^{\top }Q''(\varvec{\beta }^{*})\mathbf{v}+o(n^{-1}) \end{aligned}$$
(6.14)

where \(\varvec{\beta }^{*}\) lies between \(\varvec{\beta }_0\) and \(\varvec{\beta }_0+n^{-1/2}\mathbf{v}\). Let \(E_{\mathcal {A}}\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)\) be the projection of \(\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)\) onto the function class \(\mathcal {A}\), then there exists a vector \(\varvec{\gamma }^*\) leading to \(E_{\mathcal {A}}\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)-\mathbf{Z}\varvec{\gamma }^*=O(L^{-r})\). Then, \(\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)-E_{\mathcal {A}}\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)\) and \(E_{\mathcal {A}}\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)\) are orthogonal and \(\Vert [\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)]^{\top }(\mathbf{I}-\mathbf{M}_\mathbf{z})\Vert =\Vert [\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)-E_\mathcal {A}\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)]^{\top }(\mathbf{I}-\mathbf{M}_\mathbf{z})\Vert =O_p(K/\sqrt{n}+L^{-r})\). Thus, some calculations based on Cauchy-Schwarz inequality show that

$$\begin{aligned} \begin{aligned} Q'(\varvec{\beta }_0)&=-2[\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)-E_{\mathcal {A}}\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)]^{\top }\mathbf{R}^{\top }\mathbf{1}_{p} -2[\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)-E_{\mathcal {A}}\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)]^{\top }{\varvec{\varepsilon }}\\&\quad +O_p(L^{-r}+K/\sqrt{n}) \end{aligned} \end{aligned}$$
(6.15)

in which \((\mathbf{I}-\mathbf{M}_\mathbf{z})\mathbf{Z}=\mathbf{P}\mathbf{Z}=\mathbf{0}\) with \(\mathbf{M}_\mathbf{z}=\mathbf{Z}(\mathbf{Z}^{\top }\mathbf{Z})^{-1}\mathbf{Z}^{\top }\). Note that \(\mathrm {E}(\varepsilon )=0\) and \(\mathrm {var}(\varepsilon )=\sigma ^{2}\), then

$$\begin{aligned}&\mathrm {E}\left[n^{-1/2}\sum _{i=1}^{n} \{g'(\mathbf{x}_i, \varvec{\beta })-E_{\mathcal {A}}g'(\mathbf{x}_i, \varvec{\beta })\}\varepsilon _{i}|\mathbf{X},\mathbf{U}\right]=\mathbf{0}, \quad \mathrm {and} \\&\mathrm {var}\left[n^{-1/2}\sum _{i=1}^{n} \{g'(\mathbf{x}_i, \varvec{\beta })-E_{\mathcal {A}}g'(\mathbf{x}_i, \varvec{\beta })\}\varepsilon _{i}|\mathbf{X},\mathbf{U}\right] =\sigma ^2 \mathrm {E}[\{g'(\mathbf{x}_1, \varvec{\beta }) \\&\quad -E_{\mathcal {A}}g'(\mathbf{x}_1, \varvec{\beta })\}^{\otimes 2}|\mathbf{X},\mathbf{U}], \end{aligned}$$

which integrates the Lindeberg-Levy central limit theorem leading to

$$\begin{aligned} n^{-1/2}\sum _{i=1}^{n}\{g'(\mathbf{x}_i, \varvec{\beta })-E_{\mathcal {A}}g'(\mathbf{x}_i, \varvec{\beta })\}\varepsilon _{i} \rightarrow _d N(0,\sigma ^{2}{\varvec{\Sigma }}_{\varvec{\beta }} ) \end{aligned}$$
(6.16)

where \({\varvec{\Sigma }}_{\varvec{\beta }}=\mathrm {E}[\{g'(\mathbf{x}_1, \varvec{\beta })-E_{\mathcal {A}}g'(\mathbf{x}_1, \varvec{\beta })\}^{\otimes 2}|\mathbf{X},\mathbf{U}]\). Consequently,

$$\begin{aligned} n^{-1/2}Q'(\varvec{\beta }_0)^\top \mathbf{v}&=-2n^{-1/2}\mathbf{1}_{p}^\top \mathbf{R}[\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)-E_{\mathcal {A}}\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)]\mathbf{v}\\&\quad -2n^{-1/2}{\varvec{\varepsilon }}^{\top }[\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)-E_{\mathcal {A}}\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)]\mathbf{v}+O_p(L^{-r}/\sqrt{n}+K/n) \end{aligned}$$

that combines the assumptions (A1)–(A2) leading to

$$\begin{aligned} n^{-1/2}Q'(\varvec{\beta }_0)^{\top }\mathbf{v}=O_{p}(\Vert \mathbf{v}\Vert ). \end{aligned}$$
(6.17)

Similarly, some calculations with \(\mathbf{g}(\mathbf{X},\varvec{\beta }_0)-\mathbf{g}(\mathbf{X}, \varvec{\beta }^{*})=O_{p}(n^{-1/2}\Vert \mathbf{v}\Vert )\) show that

$$\begin{aligned} \begin{aligned} Q''(\varvec{\beta }^{*})&=2[\mathbf{g}'(\mathbf{X}, \varvec{\beta }^{*})-E_{\mathcal {A}}\mathbf{g}'(\mathbf{X}, \varvec{\beta }^{*})]^{\top } [\mathbf{g}'(\mathbf{X}, \varvec{\beta }^{*})-E_{\mathcal {A}}\mathbf{g}'(\mathbf{X}, \varvec{\beta }^{*})]\\&\quad -2[\mathbf{g}''(\mathbf{X}, \varvec{\beta }^{*})-E_{\mathcal {A}}\mathbf{g}''(\mathbf{X}, \varvec{\beta }^{*})]^{\top } (\mathbf{R}^{\top }\mathbf{1}_{p} +{\varvec{\varepsilon }})+O_p(L^{-r}+K/\sqrt{n}), \end{aligned} \end{aligned}$$

which integrates (6.16) and the idempotence of \(\mathbf{I}-\mathbf{M}_\mathbf{z}\) resulting in

$$\begin{aligned} \begin{aligned} \frac{1}{2n}\mathbf{v}^{\top }Q''(\varvec{\beta }^{*})\mathbf{v}&=O_{p}(\Vert \mathbf{v}\Vert ^{2}). \end{aligned} \end{aligned}$$
(6.18)

Therefore, the combination of (6.14), (6.17) and (6.18) indicates that

$$\begin{aligned} P(Q(\varvec{\beta }_0+n^{-1/2}\mathbf{v})-Q(\varvec{\beta }_0)>0)\rightarrow 1 \end{aligned}$$

on the basis of the fact that \(n^{-1/2}Q'(\varvec{\beta }_0)^{\top }\mathbf{v}\) is dominated by \(\frac{1}{2n}\mathbf{v}^{\top }Q''(\varvec{\beta }^{*})\mathbf{v}\) uniformly in \(\Vert \mathbf{v}\Vert =C\) for a sufficiently large C. Thus, (6.13) holds, i.e., with the probability approaching to 1, there exists local minimizer \({{\widehat{\varvec{\beta }}}}\) such that \(\Vert {{\widehat{\varvec{\beta }}}}-\varvec{\beta }_0 \Vert =O_p(n^{-1/2})\).

Now we prove the asymptotic normality of \({{\widehat{\varvec{\beta }}}}\). Note that \({{\widehat{\varvec{\beta }}}}\) is solution to \(Q'({{\widehat{\varvec{\gamma }}}}(\varvec{\beta }),\varvec{\beta })\equiv Q'(\varvec{\beta }) =0\). Then, we take the Taylor expansion of \(Q'(\varvec{\beta })\) at \(\varvec{\beta }_0\) and get

$$\begin{aligned} Q'({{\widehat{\varvec{\beta }}}})=Q'(\varvec{\beta }_0) +Q''({\widetilde{\varvec{\beta }}})^{\top }({{\widehat{\varvec{\beta }}}}-\varvec{\beta }_0)+o(({{\widehat{\varvec{\beta }}}}-\varvec{\beta }_0)^{2}) \end{aligned}$$
(6.19)

where \({\widetilde{\varvec{\beta }}}\) lies between \(\varvec{\beta }_0\) and \({{\widehat{\varvec{\beta }}}}\). Following the assumptions (A1)–(A4) and similar discussions in (6.18), we get

$$\begin{aligned} \frac{1}{2n}Q''({\widetilde{\varvec{\beta }}})={\varvec{\Sigma }}_{\varvec{\beta }} (1+o_{p}(1)). \end{aligned}$$

Applying (6.15) and \(\Vert {{\widehat{\varvec{\beta }}}}-\varvec{\beta }_0 \Vert =O_p(n^{-1/2})\), we rewrite (6.19) as

$$\begin{aligned}&\frac{2}{n}[\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)-E_{\mathcal {A}}\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)]^{\top }\mathbf{R}^{\top }\mathbf{1}_p +\frac{2}{n}[\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)-E_{\mathcal {A}}\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)]^{\top }{\varvec{\varepsilon }}\\&\quad =2{\varvec{\Sigma }}_{\varvec{\beta }}({{\widehat{\varvec{\beta }}}}-\varvec{\beta }_0)[1+o_p(1)]+o(n^{-1}), \end{aligned}$$

and further

$$\begin{aligned}&\frac{1}{\sqrt{n}}[\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)-E_{\mathcal {A}}\mathbf{g}'(\mathbf{X}, \varvec{\beta }_0)]^{\top }{\varvec{\varepsilon }}\\&\quad ={\varvec{\Sigma }}_{\varvec{\beta }}\sqrt{n}({{\widehat{\varvec{\beta }}}}-\varvec{\beta }_0)[1+o_p(1)]. \end{aligned}$$

Hence, the use of Slutsky theorem leads to

$$\begin{aligned} \sqrt{n}({{\widehat{\varvec{\beta }}}}-\varvec{\beta }_0) \rightarrow _d N(0,\sigma ^{2}{\varvec{\Sigma }}_{\varvec{\beta }}^{-1}) \end{aligned}$$

and the proof of Theorem 1 is completed. \(\square \)

Proof of Theorem 2

Firstly, we prove the consistency of \({\widehat{\varvec{\gamma }}}\). Note that

$$\begin{aligned} {\widehat{\varvec{\gamma }}}-\varvec{\gamma }_{0}&=(\mathbf{Z}^{\top }\mathbf{Z})^{-1}\mathbf{Z}^{\top }(\mathbf{g}(\mathbf{X},\varvec{\beta }_0)-\mathbf{g}(\mathbf{X},{{\widehat{\varvec{\beta }}}})) +(\mathbf{Z}^{\top }\mathbf{Z})^{-1}\mathbf{Z}^{\top }\mathbf{R}^{\top }\mathbf{1}_p+(\mathbf{Z}^{\top }\mathbf{Z})^{-1}\mathbf{Z}^{\top }{\varvec{\varepsilon }}\\&{\mathop {=}\limits ^\mathrm{def}}J_1+J_2+J_3. \end{aligned}$$

Simple calculations show that \(J_1=O_p(\frac{\sqrt{L}}{n})=o_p(n^{-1/2})\), \(J_2=O_p(n^{-1}L^{\frac{1}{2}-r})=o_p(n^{-1})\) and \(J_3=O_p(\sqrt{L/n})\), thus \(J_3\) is the dominated term that leads to \(\Vert {\widehat{\varvec{\gamma }}} -\varvec{\gamma }_0 \Vert =O_p(\sqrt{L/n})\). Moreover,

$$\begin{aligned} \Vert {{\widehat{\alpha }}}_{k}(\cdot )-\alpha _{0k}(\cdot ) \Vert ^{2}&\le 2\int _{0}^{1}( \mathbf{B}^{\top }(u_{k}){\widehat{\varvec{\gamma }}}_{k}-\mathbf{B}^{\top }(u_{k})\varvec{\gamma }_{0k} )^{2}du_{k}+2\int _{0}^{1}R_{k}(u_{k})^{2}du_{k}\\&=O_p(L/n)+O_p(L^{-2r}). \end{aligned}$$

Then, we finish the proof. \(\square \)

Proof of Theorem 3

Note that

$$\begin{aligned} \mathbf{D}^{\top }(u_0)\mathbf{W}(u_0)\mathbf{D}(u_0) =\frac{1}{n}\sum _{i=1}^{n}K_h(u_{ik}-u_0) \begin{pmatrix} 1\\ u_{ik}-u_0 \end{pmatrix} \begin{pmatrix} 1&u_{ik}-u_0 \end{pmatrix} \end{aligned}$$

that may be singular in some cases and result in irreversibility. A common method so solve this issue is to insert an identity matrix \(\mathbf{I}_{2\times 2}=\mathbf{G}_n^{-1}\mathbf{G}_n\) with \(\mathbf{G}_n=\mathrm {diag}(1, h^{-2})\), then

$$\begin{aligned} \begin{aligned}&(\breve{\alpha }_k(u_0),\breve{\alpha }'_k(u_0))^\top \\&\quad = \left[ \frac{1}{n}\sum _{i=1}^{n}K_h(u_{ik}-u_0) \begin{pmatrix} 1\\ (u_{ik}-u_0)/h^2 \end{pmatrix} \right. \\&\left. \qquad \begin{pmatrix} 1&u_{ik}-u_0 \end{pmatrix} \right] ^{-1} \frac{1}{n}\sum _{i=1}^{n}K_h(u_{ik}-u_0) \begin{pmatrix} 1\\ (u_{ik}-u_0)/h^2 \end{pmatrix}\\&\qquad \times \left[ g(\mathbf{x}_i,\varvec{\beta })+\alpha _k(u_{ik}) +\underset{k'\ne k}{\sum } \alpha _{k'}(u_{ik'})+\varepsilon _i-g(\mathbf{x}_i,{{\widehat{\varvec{\beta }}}}) -\underset{k'\ne k}{\sum } {{\widehat{\alpha }}}_{k'}(u_{ik'}) \right]. \end{aligned} \end{aligned}$$

Taking the Taylor expansion leads to

$$\begin{aligned} \alpha _k(u_{ik})&=\begin{pmatrix} 1&u_{ik}-u_0\end{pmatrix}\begin{pmatrix} \alpha _k(u_0)\\ \alpha _k'(u_0)\end{pmatrix}+\frac{1}{2}\alpha _k''(u_0)(u_{ik}-u_0)^2+R_m(u_{ik},u_0) \end{aligned}$$

where \(R_m(u_{ik},u_0)\) is the remainder, consequently,

$$\begin{aligned} \begin{aligned}&(\breve{\alpha }_k(u_0),\breve{\alpha }'_k(u_0))^\top -(\alpha _k(u_0),\alpha '_k(u_0))^\top \\&\quad =\left[\frac{1}{n}\sum _{i=1}^{n}K_h(u_{ik}-u_0) \begin{pmatrix} 1 &{} u_{ik}-u_0\\ (u_{ik}-u_0)/h^2 &{} (u_{ik}-u_0)^2/h^2 \end{pmatrix} \right]^{-1}\\&\qquad \times \left\rbrace \frac{1}{n}\sum _{i=1}^{n}K_h(u_{ik}-u_0) \begin{pmatrix} 1\\ (u_{ik}-u_0)/h^2 \end{pmatrix} \right. \\&\quad \qquad \left. \left[ \frac{1}{2}\alpha _k'(u_0)(u_{ik}-u_0)^2+R_m(u_{ik},u_0)+g(\mathbf{x}_i,\varvec{\beta })-g(\mathbf{x}_i,{{\widehat{\varvec{\beta }}}}) \right. \right.\\&\left.\left. \qquad +\underset{k'\ne k}{\sum }( \alpha _{k'}(u_{ik'})-{{\widehat{\alpha }}}_{k'}(u_{ik'}) )+\varepsilon _i \right] \right\lbrace \\&{\mathop {=}\limits ^\mathrm{def}}[A_0]^{-1}( A_1+A_2+A_3+A_4 )+(s.o) \end{aligned} \end{aligned}$$

where

$$\begin{aligned} A_0= & {} \frac{1}{n}\sum _{i=1}^{n}K_h(u_{ik}-u_0) \begin{pmatrix} 1 &{} u_{ik}-u_0\\ (u_{ik}-u_0)/h^2 &{} (u_{ik}-u_0)^2/h^2 \end{pmatrix}, \\ A_1= & {} \frac{1}{2n}\sum _{i=1}^{n}K_h(u_{ik}-u_0) \begin{pmatrix} 1\\ (u_{ik}-u_0)/h^2 \end{pmatrix} \alpha _k''(u_0)(u_{ik}-u_0)^2, \\ A_2= & {} \frac{1}{n}\sum _{i=1}^{n}K_h(u_{ik}-u_0) \begin{pmatrix} 1\\ (u_{ik}-u_0)/h^2 \end{pmatrix} (g(\mathbf{x}_i,\varvec{\beta })-g(\mathbf{x}_i,{{\widehat{\varvec{\beta }}}})) ,\\ A_3= & {} \frac{1}{n}\sum _{i=1}^{n}K_h(u_{ik}-u_0) \begin{pmatrix} 1\\ (u_{ik}-u_0)/h^2 \end{pmatrix} \underset{k'\ne k}{\sum } ( \alpha _{k'}(u_{ik'})-{{\widehat{\alpha }}}_{k'}(u_{ik'})), \\ A_4= & {} \frac{1}{n}\sum _{i=1}^{n}K_h(u_{ik}-u_0) \begin{pmatrix}1\\ (u_{ik}-u_0)/h^2\end{pmatrix}\varepsilon _i, \end{aligned}$$

and the term (s.o) is

$$\begin{aligned}{}[ A_0 ]^{-1}\frac{1}{n}\sum _{i=1}^{n}K_h(u_{ik}-u_0) \begin{pmatrix} 1\\ (u_{ik}-u_0)/h^2 \end{pmatrix} R_m(u_{ik},u_0), \end{aligned}$$

with an order being much less than that of \([ A_0 ]^{-1}A_1\), thus

$$\begin{aligned} \sqrt{nh}\mathbf{H}_n\left[ \begin{pmatrix} \breve{\alpha }_k(u_0)\\ \breve{\alpha }'_k(u_0) \end{pmatrix} - \begin{pmatrix} \alpha _k(u_0)\\ \alpha _k'(u_0) \end{pmatrix} \right] =\sqrt{nh}\mathbf{H}_n[ A_0]^{-1}\{ A_1+A_2+A_3+A_4 \}+(s.o). \end{aligned}$$

For ease of notation, write

$$\begin{aligned} \mathbf{R}=\mathrm {diag}(\mathbf{Q}^{-1})= \begin{pmatrix}1/f(u_0)&{} 0\\ 0 &{} 1/(\kappa _2f(u_0)) \end{pmatrix} \quad \mathrm {and} \quad \mathbf{V}=\begin{pmatrix} \zeta _0 \sigma ^2f(u_0) &{} 0\\ 0 &{} \zeta _2\sigma ^2f(u_0) \end{pmatrix}. \end{aligned}$$

Integrating the Lemma 2 and the Lemmas 2.1-2.3 in Li and Racine (2007), we get

$$\begin{aligned} \sqrt{nh}\mathbf{H}_n[ A_0 ]^{-1}\{ A_1+A_2+A_3+A_4 \}= & {} \sqrt{nh}\mathbf{H}_n\mathbf{Q}^{-1} \{ A_1+A_2+A_3+A_4 \}+o_p(1) \\ \sqrt{nh}\mathbf{H}_n\mathbf{Q}^{-1}\{ A_1+A_4 \}= & {} \mathbf{R}\sqrt{nh}\mathbf{H}_n\{ A_1+A_4 \}+o_p(1) \\ \sqrt{nh}\mathbf{H}_nA_1= & {} \begin{pmatrix} \sqrt{nh}(\kappa _2/2)f(u_0)h^2\alpha ''_k(u_0)\\ 0 \end{pmatrix} \\&+o_p(1) \, \mathrm { and} \, \sqrt{nh}\mathbf{H}_n A_4 \rightarrow _d N(0,\mathbf{V}) \end{aligned}$$

based on \(g(\mathbf{x}_i,\varvec{\beta })-g(\mathbf{x}_i,{{\widehat{\varvec{\beta }}}})=g'(\mathbf{x}_i,{{\widehat{\varvec{\beta }}}})(\varvec{\beta }-{{\widehat{\varvec{\beta }}}}) +o_p((\varvec{\beta }-{{\widehat{\varvec{\beta }}}})^2)\) and \(\Vert {{\widehat{\varvec{\beta }}}}-\varvec{\beta }\Vert =O_p(n^{-1/2})\). Then, \(\sqrt{nh}\mathbf{H}_nQ^{-1}A_2=o_p(1)\) and \(\sqrt{nh}\mathbf{H}_nQ^{-1}A_3=o_p(1)\). The results follows directly and when \(\mathbf{D}^{\top }(u_0)\mathbf{W}(u_0)\mathbf{D}(u_0)\) is nonsingular, the proof can be similarly derived. \(\square \)

Proof of Theorem 4

Let \(\Vert g\Vert =\sup _{x\in [0,1]}|g(x)|\) for a function g(x), \(\mathbf{A}(x)=(a_{ij}(x))_p\) and \(\Vert \mathbf{A}\Vert _\infty =( \sum _{i=1}^{p}\sum _{j=1}^{p}\Vert a_{ij} \Vert _\infty ^2 )^{1/2}\) for a matrix \(\mathbf{A}\) for ease of notation. Similar calculation as that in Theorem 3 leads to

$$\begin{aligned} \breve{\alpha }_k(u_0)-\alpha _k(u_0)-b(u_0)&=(\mathbf{D}^{\top }(u_0)\mathbf{W}(u_0)\mathbf{D}(u_0))^{-1}\mathbf{D}^{\top }(u_0)\mathbf{W}(u_0){\varvec{\varepsilon }}+o_p(1)\\&{\mathop {=}\limits ^\mathrm{def}}I_1(u_0)+o_p(1). \end{aligned}$$

The use of Lemma 3 results in

$$\begin{aligned} n\mathbf{H}_n(\mathbf{D}^{\top }(u_0)\mathbf{W}(u_0)\mathbf{D}(u_0))^{-1}\mathbf{H}_n=f^{-1}(u_0){\varvec{\Omega }}^{-1} +O_p(h+(\log n/nh)^{1/2}) \end{aligned}$$
(6.20)

with \({\varvec{\Omega }}=\begin{pmatrix}1&{}0\\ 0&{}\kappa _2\end{pmatrix}\) and

$$\begin{aligned} \Vert \frac{1}{n} \mathbf{H}_n^{-1}\mathbf{D}^{\top }(u_0)\mathbf{W}(u_0){\varvec{\varepsilon }}\Vert _{\infty } =O_p((\log n/nh)^{1/2}), \end{aligned}$$
(6.21)

consequently,

$$\begin{aligned}&\Vert I_1(u_0)-\frac{1}{nf(u_0)}(1,0){\varvec{\Omega }}^{-1}\mathbf{H}_n^{-1}\mathbf{D}^{\top }(u_0)\mathbf{W}(u_0){\varvec{\varepsilon }}\Vert _\infty \\&\quad =O_p(h(\log n/nh)^{1/2}+(\log n/nh)). \end{aligned}$$

Furthermore, we denote

$$\begin{aligned} I_2(u_0)&=\frac{1}{nf(u_0)}(1,0){\varvec{\Omega }}^{-1}\mathbf{H}_n^{-1}\mathbf{D}^{\top }(u_0)\mathbf{W}(u_0){\varvec{\varepsilon }}=\frac{1}{nf(u_0)}\sum _{i=1}^{n}K_h(u_{ik}-u_0)\varepsilon _i \end{aligned}$$

and apply the Theorem 1 and the Lemma 1 in Fan and Zhang (2000), for \(h=n^{-\rho }\) with \(1/5\le \rho \le 1/3\),

$$\begin{aligned} \lim _{n\rightarrow \infty }P\{ (-2\log h)^{1/2}( \Vert ({nh}/{\sigma ^2_{\alpha }})^{1/2}I_2(u_0) \Vert _\infty -d_n )<z \}=\exp (-2\exp (-z)). \end{aligned}$$

Thus, the proof is completed. \(\square \)

Proof of Theorem 5

It suffices to show the convergence rates of the estimated bias and variance of \(\breve{\alpha }_k(u_0)\). First, the result (6.20) and the proof in Theorem 4 result in

$$\begin{aligned} \Vert {\widehat{bias}}(\breve{\alpha }_k(u_0)|\mathcal {D})-b(u_0) \Vert _\infty =O_p(h^2(\sqrt{\log n/nh_{*}^5}))=O_p(h^2(n^{-1/7}\log ^{1/2}n)) \end{aligned}$$
(6.22)

with \(h_{*}=O(n^{-1/7})\). Besides, the Lemma 3 and similar discussions as that in (6.21) lead to

$$\begin{aligned} \Vert \frac{h}{n}\mathbf{H}_n^{-1}(\mathbf{D}^{\top }(u_0)\mathbf{W}(u_0)\mathbf{W}(u_0)\mathbf{D}(u_0))\mathbf{H}_n^{-1} -f(u_0){\varvec{\Lambda }}\Vert _\infty =o_p(1) \end{aligned}$$

with \({\varvec{\Lambda }}=\begin{pmatrix}\kappa _0&{}0\\ 0&{}\kappa _2 \end{pmatrix}\). Moreover, \(\Vert {\widehat{\sigma }}^2-\sigma ^2 \Vert _\infty =o_p(1)\) is easy to derived. Thus, the results mentioned above together with Theorem 2 indicating that for each \(u_0 \in [0,1]\),

$$\begin{aligned} \Vert nh\widehat{\mathrm {var}}\{ \breve{\alpha }_k(u_0)|\mathcal {D} \} -\sigma ^2_{\alpha } \Vert _\infty =o_p(1). \end{aligned}$$
(6.23)

Finally, the combination of (6.22), (6.23) and Theorem 4 shows the conclusion. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, R., Zhang, Y. Two-stage estimation and simultaneous confidence band in partially nonlinear additive model. Metrika 84, 1109–1140 (2021). https://doi.org/10.1007/s00184-021-00808-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-021-00808-3

Keywords

Mathematics Subject Classification

Navigation