Skip to main content
Log in

Nonlinear surface regression with dimension reduction method

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

This paper considers nonlinear regression analysis with a scalar response and multiple predictors. An unknown regression function is approximated by radial basis function models. The coefficients are estimated in the context of M-estimation. It is known that ordinary M-estimation leads to overfitting in nonlinear regression. The purpose of this paper is to construct a smooth estimator. The proposed method in this paper is conducted by a two-step procedure. First, the sufficient dimension reduction methods are applied to the response and radial basis functions for transforming the large number of radial bases to a small number of linear combinations of the radial bases without loss of information. In the second step, a multiple linear regression model between a response and the transformed radial bases is assumed and the ordinary M-estimation is applied. Thus, the final estimator is also obtained as a linear combination of radial bases. The validity and an asymptotic study of the proposed method are explored. A simulation and data example are addressed to confirm the behavior of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Alimadad, A., Salibian-Barrera, M.: An outlier-robust fit for generalized additive models with applications to disease outbreak detection. J. Am. Statist. Assoc. 106, 719–731 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  • Buja, A., Hastie, T., Tibshirani, R.: Linear smoothers and additive models. Ann. Stat. 17, 453–555 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  • Chen, X., Zhou, C., Cook, R.D.: Coordinate-Independent sparse sufficient dimension reduction and variable selection. Ann. Stat. 38, 3696–3723 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Cook, R.D., Weisberg, S.: Discussion of ‘Sliced inverse regression for dimension reduction” by K.C. Li. J. Am. Stat. Assoc. 86, 328–332 (1991)

    MATH  Google Scholar 

  • Cook, R.D.: Principle hessian directions revisited. J. Am. Stat. Assoc. 93, 84–100 (1998)

    Article  MATH  Google Scholar 

  • Cook, R.D., Li, B.: Dimension reduction for the conditional mean in regression. Ann. Stat. 30, 455–474 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  • de Boor, C.: A Practical Guide to Splines, revised edn. Springer, Berlin (2001)

    MATH  Google Scholar 

  • Diaconis, P., Freedman, D.: Asymptotics of graphical projection pursuit. Ann. Stat. 12, 793–815 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  • Ganguli, B., Wand, M.P.: Feature significance in geostatistics. J. Comput. Gr. Stat. 13, 954–973 (2004)

    Article  MathSciNet  Google Scholar 

  • Gu, C.: Smoothing Spline ANOVA Models. Springer, Berlin (2002)

    Book  MATH  Google Scholar 

  • Green, P.J., Silverman, B.W.: Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London (1993)

    MATH  Google Scholar 

  • Hall, P., Li, K.C.: On almost linearity of low dimensional projections from high dimensional data. Ann. Stat. 21, 867–889 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  • Hsing, T., Carroll, R.J.: An asymptotic theory for sliced inverse regression. Ann. Stat. 20, 623–1145 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  • Huber, P., Ronchetti, E.M.: Robust Statistics, 2nd edn. Wiley, Hoboken (2009)

    Book  MATH  Google Scholar 

  • Hubert, M., Rousseeuw, P.J., Branden, K.V.: Robpca: a new approach to robust principal component analysis. Technometrics 47, 64–79 (2005)

    Article  MathSciNet  Google Scholar 

  • Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46, 33–50 (1978)

    Article  MathSciNet  MATH  Google Scholar 

  • Lee, T.C.M., Oh, H.: Robust penalized regression spline fitting with application to additive mixed modeling. Comput. Stat. 22, 159–171 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, K.C.: Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 102, 997–1008 (1991)

    Article  MathSciNet  Google Scholar 

  • Li, K.C.: On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. J. Am. Stat. Assoc. 87, 1025–1039 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, B., Zha, H., Chiaromonte, F.: Contour regression: a general approach to dimension reduction. Ann. Stat. 33, 1580–1616 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, B., Wang, S.: On directional regression for dimension reduction. J. Am. Stat. Assoc. 33, 1580–1616 (2007)

    MathSciNet  Google Scholar 

  • Li, L.: Sparse sufficient dimension reduction. Biometrika 94, 603–611 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, Y., Zhu, L.X.: Asymptotics for sliced average variance estimation. Ann. Stat 35, 41–69 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  • Li, F., Villani, M.: Efficient Bayesian multivariate surface regression. Scand. J. Stat 40, 706–723 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  • Nychka, D.W.: Spatial process estimates as smoothers. In: Schimek, M. (ed.) Smoothing and Regression. Springer, Heidelberg (2000)

    Google Scholar 

  • Pollard, D.: Asymptotics for least absolute deviation regression estimators. Econom. Theory 7, 186–199 (1991)

    Article  MathSciNet  Google Scholar 

  • Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)

    Book  MATH  Google Scholar 

  • Wahba, G., Wang, Y., Gu, C., Klein, R., Klein, B.: Smoothing spline ANOVA for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. Ann. Stat. 23, 1865–1895 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  • Wong, R.K., Yao, F., Lee, T.C.: Robust estimation for generalized additive models. J. Comput. Gr. Stat. 23, 270–289 (2014)

    Article  MathSciNet  Google Scholar 

  • Wood, S.N.: Thin plate regression splines. J. R. Stat. Soc. B. 65, 95–114 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  • Wood, S.N.: Generalized Additive Models. An Introduction With R. Chapman & Hall/CRC, Boca Raton (2006)

    MATH  Google Scholar 

  • Xiao, L., Li, Y., Ruppert, D.: Fast bivariate \(P\)-splines: the sandwich smoother. J. R. Stat. Soc. B 75, 577–599 (2013)

    Article  MathSciNet  Google Scholar 

  • Zhu, L., Miao, B., Peng, H.: On sliced inverse regression with high-dimensional covariates. J. Am. Stat. Assoc. 101, 630–643 (2006)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

The authors are grateful to Editor, Associate Editor and two anonymous referees for their valuable comments and suggestions, which led to the improvement of the paper. The research of the author was partly supported by KAKENHI 26730019.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Takuma Yoshida.

Appendices

Appendix 1: Examples of sufficient dimension reduction methods

We describe here the estimation algorithm to obtain \(\{\hat{{\varvec{\beta }}}_1,\ldots ,\hat{{\varvec{\beta }}}_J\}\) in Sect. 2. Several sufficient dimension reduction methods can be formulated as the eigenvalue problem:

$$\begin{aligned} M {\varvec{\beta }}_j=\lambda _j G {\varvec{\beta }}_j, \quad j=1,\ldots ,J, \end{aligned}$$
(11)

where the K square matrix M is symmetric and nonnegative definite, the K square matrix G is positive definite, vectors \({\varvec{\beta }}_1,\ldots ,{\varvec{\beta }}_K\) are eigenvectors satisfying \({\varvec{\beta }}_i^TG{\varvec{\beta }}_j=1\) for \(i=j\), and 0 for otherwise, and \(\lambda _1\ge \cdots \ge \lambda _K\ge 0\) are the corresponding eigenvalues. Specifying M and G, several dimension reduction methods such as SIR, SAVE, PHD and others can be obtained. Let \(\varSigma =E[\{{\varvec{\phi }}(X)-E[\phi (X)]\}^2]\) and \({\varvec{Z}}=\varSigma ^{-1}({\varvec{\phi }}(X)-E[{\varvec{\phi }}(X)])\) be a covariance matrix of \({\varvec{\phi }}(X)\) and the standardized version of \({\varvec{\phi }}(X)\). When \(M=Cov(E[{\varvec{\phi }}(X)-E[{\varvec{\phi }}(X)]|y])\) and \(G=\varSigma \) are used, then (11) is SIR. The SAVE is corresponding to the case \(M=\varSigma ^{1/2}E[\{I-Cov({\varvec{Z}}|y)\}^2]\varSigma ^{1/2}\) and \(G=\varSigma \), where I is the identity matrix. If we want to use the PHD, M and G are selected as \(\varSigma ^{1/2}\varSigma _{yz}^2\varSigma ^{1/2}\) and \(G=\varSigma \), where \(\varSigma _{yz}=E[\{Y-E[Y]\}Z^T]\). The regularized version of SIR, SAVE and PHD using ridge penalty is obtained by modifying \(G=\varSigma +\gamma I\), where \(\gamma >0\) is the tuning parameter. We note that the tuning parameter selection should be practiced in the regularized version. Other methods are summarized by Li (2007) and Chen et al. (2010). The estimator \(\{\hat{{\varvec{\beta }}}_1,\ldots ,\hat{{\varvec{\beta }}}_J\}\) of \(\{{\varvec{\beta }}_1,\ldots ,{\varvec{\beta }}_J\}\) can be established using sample version of M and G, written as \(\hat{M}\) and \(\hat{G}\). The estimation algorithms have been clarified by Li (1991), Cook and Weisberg (1991), Li (1992) and Cook (1998).

Appendix 2: Proof of Proposition 1 and Theorem 1

Proof of Proposition 1

Let \(U_i=Y_i-{\varvec{\phi }}({\varvec{x}}_i)^TB_0^T\tilde{{\varvec{w}}}\), \(A_n=\sqrt{n}(B_0-\hat{B})\) and let

$$\begin{aligned} Q_{in}({\varvec{t}})= & {} \rho \left( U_i+\frac{1}{\sqrt{n}}{\varvec{\phi }}({\varvec{x}}_i)^TA_n^T\tilde{{\varvec{w}}}-\frac{1}{\sqrt{n}}{\varvec{\phi }}({\varvec{x}}_i)^TB_0^T{\varvec{t}}+\frac{1}{n}{\varvec{\phi }}({\varvec{x}}_i)^TA_n^T{\varvec{t}}\right) \\&-\,\rho (U_i). \end{aligned}$$

and \(Q_{n}({\varvec{t}})=\sum _{i=1}^nQ_{in}({\varvec{t}})\) Then the minimizer of \(Q_n\) can be easily shown to be

$$\begin{aligned} \hat{{\varvec{t}}}=\sqrt{n}(\hat{{\varvec{w}}}-\tilde{{\varvec{w}}}). \end{aligned}$$

Actually, we have \(Q_n(\hat{{\varvec{t}}})=\sum _{i=1}^n \rho (Y_i-{\varvec{\phi }}({\varvec{x}}_i)^T\hat{B}^T\hat{{\varvec{w}}})-\rho (U_i)\). Let for \(i=1,\ldots ,n\),

$$\begin{aligned} \alpha _{in}({\varvec{t}})=\frac{1}{\sqrt{n}}{\varvec{\phi }}({\varvec{x}}_i)^TA_n^T\tilde{{\varvec{w}}}-\frac{1}{\sqrt{n}}{\varvec{\phi }}({\varvec{x}}_i)^TB_0^T{\varvec{t}}+\frac{1}{n}{\varvec{\phi }}({\varvec{x}}_i)^TA_n^T{\varvec{t}}. \end{aligned}$$

Since \(\alpha _{in}({\varvec{t}})=O_P(n^{-1/2})\), we use the Taylor expansion \(Q_n({\varvec{t}})\) around \({\varvec{\alpha }}_{in}({\varvec{t}})=0\) and hence we obtain

$$\begin{aligned} Q_n({\varvec{t}})=\sum _{i=1}^n\rho ^\prime (U_i)\alpha _{in}({\varvec{t}})+\frac{1}{2}\sum _{i=1}^n \rho ^{\prime \prime }(U_i)\alpha _{in}({\varvec{t}})^2 + o_P(n^{-1}) \end{aligned}$$

Let \({\varvec{W}}_n=n^{-1/2}\sum _{i=1}^n \rho ^\prime (U_i){\varvec{\phi }}({\varvec{x}}_i)\). Then from Lyapunov’s central limit theorem, we see that \({\varvec{W}}_n\) has the asymptotic normality with mean \({\varvec{0}}\) and variance

$$\begin{aligned} \displaystyle \lim _{n\rightarrow \infty } n^{-1}\displaystyle \sum _{i=1}^n V[\rho ^\prime (U_i)^2]{\varvec{\phi }}({\varvec{x}}_i){\varvec{\phi }}({\varvec{x}}_i)^T. \end{aligned}$$

To satisfy this, Assumption 3 (b) and (c) is necessary. Thus, we have \({\varvec{W}}_n=O_P(1)\). Writing \(\varGamma _n=n^{-1}\sum _{i=1}^n \rho ^{\prime \prime }(U_i){\varvec{\phi }}({\varvec{x}}_i){\varvec{\phi }}({\varvec{x}}_i)^T\), \(E[\varGamma _n]\) is bounded by Assumption 3 (d). We see that \(Q_n\) can be expressed as

$$\begin{aligned} Q_n({\varvec{t}})= & {} -{\varvec{W}}_n^TB_0^T{\varvec{t}}+{\varvec{W}}_n^TA_n^T\tilde{{\varvec{w}}}+o_P(1) \nonumber \\&+\;\frac{1}{2}{\varvec{t}}^TB_0E[\varGamma _n]B_0^T{\varvec{t}}-\tilde{{\varvec{w}}}^TA_nE[\varGamma _n] B_0^T{\varvec{t}} \nonumber \\&+\;\frac{1}{2}\tilde{{\varvec{w}}}^TA_nE[\varGamma _n]A_n^T\tilde{{\varvec{w}}}+o_P(1). \end{aligned}$$
(12)

From the study by Pollard (1991), the minimizer of \(Q_n\) is asymptotically equivalent to the minimizer of the leading term of the right-hand side of (12) since \(Q_n\) is convex and has a unique minimizer. Therefore, we have as \(n\rightarrow \infty \),

$$\begin{aligned} \hat{{\varvec{t}}}-\tilde{{\varvec{t}}}\rightarrow {\varvec{0}}, \end{aligned}$$

where

$$\begin{aligned} \tilde{{\varvec{t}}}=-(B_0E[\varGamma _n]B_0^T)^{-1}B_0\{{\varvec{W}}_n +E[\varGamma _n]A_n^T\tilde{{\varvec{w}}}\}. \end{aligned}$$

Consequently, \(\hat{{\varvec{t}}}=O_P(1)\) and this completes the proof. \(\square \)

Proof (Proof of Theorem 1)

First, from (10), the proposed estimator can be expressed as

$$\begin{aligned} \hat{S}({\varvec{x}})-S({\varvec{x}})= & {} {\varvec{\phi }}({\varvec{x}})^T\hat{B}^T\hat{{\varvec{w}}}-{\varvec{\phi }}({\varvec{x}})^TB_0^T{\varvec{w}}_0 \\= & {} {\varvec{\phi }}({\varvec{x}})^T\hat{B}^T\hat{{\varvec{w}}}-{\varvec{\phi }}({\varvec{x}})^TB_0^T\tilde{{\varvec{w}}}+{\varvec{\phi }}({\varvec{x}})^TB_0^T\tilde{{\varvec{w}}}-{\varvec{\phi }}({\varvec{x}})^TB_0^T{\varvec{w}}_0 \\= & {} {\varvec{\phi }}({\varvec{x}})^T\hat{B}^T\hat{{\varvec{w}}}-{\varvec{\phi }}({\varvec{x}})^TB_0^T\tilde{{\varvec{w}}}+O_P(n^{-1/2}). \end{aligned}$$

Next we have

$$\begin{aligned} {\varvec{\phi }}({\varvec{x}})^T\hat{B}^T\hat{{\varvec{w}}}-{\varvec{\phi }}({\varvec{x}})^TB_0^T\tilde{{\varvec{w}}}= & {} {\varvec{\phi }}({\varvec{x}})^TB_0^T(\hat{{\varvec{w}}}-\tilde{{\varvec{w}}})+{\varvec{\phi }}({\varvec{x}})^T(\hat{B}-B_0)^T\hat{{\varvec{w}}} \\= & {} {\varvec{\phi }}({\varvec{x}})^T(\hat{B}-B_0)^T\hat{{\varvec{w}}}+O_P(n^{-1/2}). \end{aligned}$$

From (10) and Proposition 1, \(\hat{{\varvec{w}}}-{\varvec{w}}_0=O_P(n^{-1/2})\) is satisfied. Therefore, we obtain

$$\begin{aligned} {\varvec{\phi }}({\varvec{x}})^T(\hat{B}-B_0)^T\hat{{\varvec{w}}}= & {} {\varvec{\phi }}({\varvec{x}})^T(\hat{B}-B_0)^T(\hat{{\varvec{w}}}-{\varvec{w}}_0)+{\varvec{\phi }}({\varvec{x}})^T(\hat{B}-B_0)^T{\varvec{w}}_0\\= & {} O_P(n^{-1})+O_P(n^{-1/2}). \end{aligned}$$

Consequently, we have \(\hat{S}({\varvec{x}})-S({\varvec{x}})=O_P(n^{-1/2})\) and this completes the proof. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yoshida, T. Nonlinear surface regression with dimension reduction method. AStA Adv Stat Anal 101, 29–50 (2017). https://doi.org/10.1007/s10182-016-0271-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-016-0271-2

Keywords

Navigation