Nonlinear surface regression with dimension reduction method

Yoshida, Takuma

doi:10.1007/s10182-016-0271-2

Nonlinear surface regression with dimension reduction method

Original Paper
Published: 03 June 2016

Volume 101, pages 29–50, (2017)
Cite this article

AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Takuma Yoshida¹

269 Accesses
1 Citation
Explore all metrics

Abstract

This paper considers nonlinear regression analysis with a scalar response and multiple predictors. An unknown regression function is approximated by radial basis function models. The coefficients are estimated in the context of M-estimation. It is known that ordinary M-estimation leads to overfitting in nonlinear regression. The purpose of this paper is to construct a smooth estimator. The proposed method in this paper is conducted by a two-step procedure. First, the sufficient dimension reduction methods are applied to the response and radial basis functions for transforming the large number of radial bases to a small number of linear combinations of the radial bases without loss of information. In the second step, a multiple linear regression model between a response and the transformed radial bases is assumed and the ordinary M-estimation is applied. Thus, the final estimator is also obtained as a linear combination of radial bases. The validity and an asymptotic study of the proposed method are explored. A simulation and data example are addressed to confirm the behavior of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The ridge method in a radial basis function neural network

Article 10 March 2015

Comparison of Radial Basis Functions

Article 01 July 2018

A Geometrical Approach for Parameter Selection of Radial Basis Functions Networks

References

Alimadad, A., Salibian-Barrera, M.: An outlier-robust fit for generalized additive models with applications to disease outbreak detection. J. Am. Statist. Assoc. 106, 719–731 (2011)
Article MathSciNet MATH Google Scholar
Buja, A., Hastie, T., Tibshirani, R.: Linear smoothers and additive models. Ann. Stat. 17, 453–555 (1989)
Article MathSciNet MATH Google Scholar
Chen, X., Zhou, C., Cook, R.D.: Coordinate-Independent sparse sufficient dimension reduction and variable selection. Ann. Stat. 38, 3696–3723 (2010)
Article MathSciNet MATH Google Scholar
Cook, R.D., Weisberg, S.: Discussion of ‘Sliced inverse regression for dimension reduction” by K.C. Li. J. Am. Stat. Assoc. 86, 328–332 (1991)
MATH Google Scholar
Cook, R.D.: Principle hessian directions revisited. J. Am. Stat. Assoc. 93, 84–100 (1998)
Article MATH Google Scholar
Cook, R.D., Li, B.: Dimension reduction for the conditional mean in regression. Ann. Stat. 30, 455–474 (2002)
Article MathSciNet MATH Google Scholar
de Boor, C.: A Practical Guide to Splines, revised edn. Springer, Berlin (2001)
MATH Google Scholar
Diaconis, P., Freedman, D.: Asymptotics of graphical projection pursuit. Ann. Stat. 12, 793–815 (1984)
Article MathSciNet MATH Google Scholar
Ganguli, B., Wand, M.P.: Feature significance in geostatistics. J. Comput. Gr. Stat. 13, 954–973 (2004)
Article MathSciNet Google Scholar
Gu, C.: Smoothing Spline ANOVA Models. Springer, Berlin (2002)
Book MATH Google Scholar
Green, P.J., Silverman, B.W.: Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London (1993)
MATH Google Scholar
Hall, P., Li, K.C.: On almost linearity of low dimensional projections from high dimensional data. Ann. Stat. 21, 867–889 (1993)
Article MathSciNet MATH Google Scholar
Hsing, T., Carroll, R.J.: An asymptotic theory for sliced inverse regression. Ann. Stat. 20, 623–1145 (1992)
Article MathSciNet MATH Google Scholar
Huber, P., Ronchetti, E.M.: Robust Statistics, 2nd edn. Wiley, Hoboken (2009)
Book MATH Google Scholar
Hubert, M., Rousseeuw, P.J., Branden, K.V.: Robpca: a new approach to robust principal component analysis. Technometrics 47, 64–79 (2005)
Article MathSciNet Google Scholar
Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46, 33–50 (1978)
Article MathSciNet MATH Google Scholar
Lee, T.C.M., Oh, H.: Robust penalized regression spline fitting with application to additive mixed modeling. Comput. Stat. 22, 159–171 (2007)
Article MathSciNet MATH Google Scholar
Li, K.C.: Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 102, 997–1008 (1991)
Article MathSciNet Google Scholar
Li, K.C.: On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. J. Am. Stat. Assoc. 87, 1025–1039 (1992)
Article MathSciNet MATH Google Scholar
Li, B., Zha, H., Chiaromonte, F.: Contour regression: a general approach to dimension reduction. Ann. Stat. 33, 1580–1616 (2005)
Article MathSciNet MATH Google Scholar
Li, B., Wang, S.: On directional regression for dimension reduction. J. Am. Stat. Assoc. 33, 1580–1616 (2007)
MathSciNet Google Scholar
Li, L.: Sparse sufficient dimension reduction. Biometrika 94, 603–611 (2007)
Article MathSciNet MATH Google Scholar
Li, Y., Zhu, L.X.: Asymptotics for sliced average variance estimation. Ann. Stat 35, 41–69 (2007)
Article MathSciNet MATH Google Scholar
Li, F., Villani, M.: Efficient Bayesian multivariate surface regression. Scand. J. Stat 40, 706–723 (2013)
Article MathSciNet MATH Google Scholar
Nychka, D.W.: Spatial process estimates as smoothers. In: Schimek, M. (ed.) Smoothing and Regression. Springer, Heidelberg (2000)
Google Scholar
Pollard, D.: Asymptotics for least absolute deviation regression estimators. Econom. Theory 7, 186–199 (1991)
Article MathSciNet Google Scholar
Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)
Book MATH Google Scholar
Wahba, G., Wang, Y., Gu, C., Klein, R., Klein, B.: Smoothing spline ANOVA for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. Ann. Stat. 23, 1865–1895 (1995)
Article MathSciNet MATH Google Scholar
Wong, R.K., Yao, F., Lee, T.C.: Robust estimation for generalized additive models. J. Comput. Gr. Stat. 23, 270–289 (2014)
Article MathSciNet Google Scholar
Wood, S.N.: Thin plate regression splines. J. R. Stat. Soc. B. 65, 95–114 (2003)
Article MathSciNet MATH Google Scholar
Wood, S.N.: Generalized Additive Models. An Introduction With R. Chapman & Hall/CRC, Boca Raton (2006)
MATH Google Scholar
Xiao, L., Li, Y., Ruppert, D.: Fast bivariate $P$-splines: the sandwich smoother. J. R. Stat. Soc. B 75, 577–599 (2013)
Article MathSciNet Google Scholar
Zhu, L., Miao, B., Peng, H.: On sliced inverse regression with high-dimensional covariates. J. Am. Stat. Assoc. 101, 630–643 (2006)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The authors are grateful to Editor, Associate Editor and two anonymous referees for their valuable comments and suggestions, which led to the improvement of the paper. The research of the author was partly supported by KAKENHI 26730019.

Author information

Authors and Affiliations

Graduate School of Science and Engineering, Kagoshima University, Kagoshima, 890-8580, Japan
Takuma Yoshida

Authors

Takuma Yoshida
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Takuma Yoshida.

Appendices

Appendix 1: Examples of sufficient dimension reduction methods

We describe here the estimation algorithm to obtain $\{\hat{{\varvec{\beta }}}_1,\ldots ,\hat{{\varvec{\beta }}}_J\}$ in Sect. 2. Several sufficient dimension reduction methods can be formulated as the eigenvalue problem:

$$\begin{aligned} M {\varvec{\beta }}_j=\lambda _j G {\varvec{\beta }}_j, \quad j=1,\ldots ,J, \end{aligned}$$

(11)

where the K square matrix M is symmetric and nonnegative definite, the K square matrix G is positive definite, vectors ${\varvec{\beta }}_1,\ldots ,{\varvec{\beta }}_K$ are eigenvectors satisfying ${\varvec{\beta }}_i^TG{\varvec{\beta }}_j=1$ for $i=j$, and 0 for otherwise, and $\lambda _1\ge \cdots \ge \lambda _K\ge 0$ are the corresponding eigenvalues. Specifying M and G, several dimension reduction methods such as SIR, SAVE, PHD and others can be obtained. Let $\varSigma =E[\{{\varvec{\phi }}(X)-E[\phi (X)]\}^2]$ and ${\varvec{Z}}=\varSigma ^{-1}({\varvec{\phi }}(X)-E[{\varvec{\phi }}(X)])$ be a covariance matrix of ${\varvec{\phi }}(X)$ and the standardized version of ${\varvec{\phi }}(X)$. When $M=Cov(E[{\varvec{\phi }}(X)-E[{\varvec{\phi }}(X)]|y])$ and $G=\varSigma $ are used, then (11) is SIR. The SAVE is corresponding to the case $M=\varSigma ^{1/2}E[\{I-Cov({\varvec{Z}}|y)\}^2]\varSigma ^{1/2}$ and $G=\varSigma $, where I is the identity matrix. If we want to use the PHD, M and G are selected as $\varSigma ^{1/2}\varSigma _{yz}^2\varSigma ^{1/2}$ and $G=\varSigma $, where $\varSigma _{yz}=E[\{Y-E[Y]\}Z^T]$. The regularized version of SIR, SAVE and PHD using ridge penalty is obtained by modifying $G=\varSigma +\gamma I$, where $\gamma >0$ is the tuning parameter. We note that the tuning parameter selection should be practiced in the regularized version. Other methods are summarized by Li (2007) and Chen et al. (2010). The estimator $\{\hat{{\varvec{\beta }}}_1,\ldots ,\hat{{\varvec{\beta }}}_J\}$ of $\{{\varvec{\beta }}_1,\ldots ,{\varvec{\beta }}_J\}$ can be established using sample version of M and G, written as $\hat{M}$ and $\hat{G}$. The estimation algorithms have been clarified by Li (1991), Cook and Weisberg (1991), Li (1992) and Cook (1998).

Appendix 2: Proof of Proposition 1 and Theorem 1

Proof of Proposition 1

Let $U_i=Y_i-{\varvec{\phi }}({\varvec{x}}_i)^TB_0^T\tilde{{\varvec{w}}}$, $A_n=\sqrt{n}(B_0-\hat{B})$ and let

$$\begin{aligned} Q_{in}({\varvec{t}})= & {} \rho \left( U_i+\frac{1}{\sqrt{n}}{\varvec{\phi }}({\varvec{x}}_i)^TA_n^T\tilde{{\varvec{w}}}-\frac{1}{\sqrt{n}}{\varvec{\phi }}({\varvec{x}}_i)^TB_0^T{\varvec{t}}+\frac{1}{n}{\varvec{\phi }}({\varvec{x}}_i)^TA_n^T{\varvec{t}}\right) \\&-\,\rho (U_i). \end{aligned}$$

and $Q_{n}({\varvec{t}})=\sum _{i=1}^nQ_{in}({\varvec{t}})$ Then the minimizer of $Q_n$ can be easily shown to be

$$\begin{aligned} \hat{{\varvec{t}}}=\sqrt{n}(\hat{{\varvec{w}}}-\tilde{{\varvec{w}}}). \end{aligned}$$

Actually, we have $Q_n(\hat{{\varvec{t}}})=\sum _{i=1}^n \rho (Y_i-{\varvec{\phi }}({\varvec{x}}_i)^T\hat{B}^T\hat{{\varvec{w}}})-\rho (U_i)$. Let for $i=1,\ldots ,n$,

$$\begin{aligned} \alpha _{in}({\varvec{t}})=\frac{1}{\sqrt{n}}{\varvec{\phi }}({\varvec{x}}_i)^TA_n^T\tilde{{\varvec{w}}}-\frac{1}{\sqrt{n}}{\varvec{\phi }}({\varvec{x}}_i)^TB_0^T{\varvec{t}}+\frac{1}{n}{\varvec{\phi }}({\varvec{x}}_i)^TA_n^T{\varvec{t}}. \end{aligned}$$

Since $\alpha _{in}({\varvec{t}})=O_P(n^{-1/2})$, we use the Taylor expansion $Q_n({\varvec{t}})$ around ${\varvec{\alpha }}_{in}({\varvec{t}})=0$ and hence we obtain

$$\begin{aligned} Q_n({\varvec{t}})=\sum _{i=1}^n\rho ^\prime (U_i)\alpha _{in}({\varvec{t}})+\frac{1}{2}\sum _{i=1}^n \rho ^{\prime \prime }(U_i)\alpha _{in}({\varvec{t}})^2 + o_P(n^{-1}) \end{aligned}$$

Let ${\varvec{W}}_n=n^{-1/2}\sum _{i=1}^n \rho ^\prime (U_i){\varvec{\phi }}({\varvec{x}}_i)$. Then from Lyapunov’s central limit theorem, we see that ${\varvec{W}}_n$ has the asymptotic normality with mean ${\varvec{0}}$ and variance

$$\begin{aligned} \displaystyle \lim _{n\rightarrow \infty } n^{-1}\displaystyle \sum _{i=1}^n V[\rho ^\prime (U_i)^2]{\varvec{\phi }}({\varvec{x}}_i){\varvec{\phi }}({\varvec{x}}_i)^T. \end{aligned}$$

To satisfy this, Assumption 3 (b) and (c) is necessary. Thus, we have ${\varvec{W}}_n=O_P(1)$. Writing $\varGamma _n=n^{-1}\sum _{i=1}^n \rho ^{\prime \prime }(U_i){\varvec{\phi }}({\varvec{x}}_i){\varvec{\phi }}({\varvec{x}}_i)^T$, $E[\varGamma _n]$ is bounded by Assumption 3 (d). We see that $Q_n$ can be expressed as

$$\begin{aligned} Q_n({\varvec{t}})= & {} -{\varvec{W}}_n^TB_0^T{\varvec{t}}+{\varvec{W}}_n^TA_n^T\tilde{{\varvec{w}}}+o_P(1) \nonumber \\&+\;\frac{1}{2}{\varvec{t}}^TB_0E[\varGamma _n]B_0^T{\varvec{t}}-\tilde{{\varvec{w}}}^TA_nE[\varGamma _n] B_0^T{\varvec{t}} \nonumber \\&+\;\frac{1}{2}\tilde{{\varvec{w}}}^TA_nE[\varGamma _n]A_n^T\tilde{{\varvec{w}}}+o_P(1). \end{aligned}$$

(12)

From the study by Pollard (1991), the minimizer of $Q_n$ is asymptotically equivalent to the minimizer of the leading term of the right-hand side of (12) since $Q_n$ is convex and has a unique minimizer. Therefore, we have as $n\rightarrow \infty $,

$$\begin{aligned} \hat{{\varvec{t}}}-\tilde{{\varvec{t}}}\rightarrow {\varvec{0}}, \end{aligned}$$

where

$$\begin{aligned} \tilde{{\varvec{t}}}=-(B_0E[\varGamma _n]B_0^T)^{-1}B_0\{{\varvec{W}}_n +E[\varGamma _n]A_n^T\tilde{{\varvec{w}}}\}. \end{aligned}$$

Consequently, $\hat{{\varvec{t}}}=O_P(1)$ and this completes the proof. $\square $

Proof (Proof of Theorem 1)

First, from (10), the proposed estimator can be expressed as

$$\begin{aligned} \hat{S}({\varvec{x}})-S({\varvec{x}})= & {} {\varvec{\phi }}({\varvec{x}})^T\hat{B}^T\hat{{\varvec{w}}}-{\varvec{\phi }}({\varvec{x}})^TB_0^T{\varvec{w}}_0 \\= & {} {\varvec{\phi }}({\varvec{x}})^T\hat{B}^T\hat{{\varvec{w}}}-{\varvec{\phi }}({\varvec{x}})^TB_0^T\tilde{{\varvec{w}}}+{\varvec{\phi }}({\varvec{x}})^TB_0^T\tilde{{\varvec{w}}}-{\varvec{\phi }}({\varvec{x}})^TB_0^T{\varvec{w}}_0 \\= & {} {\varvec{\phi }}({\varvec{x}})^T\hat{B}^T\hat{{\varvec{w}}}-{\varvec{\phi }}({\varvec{x}})^TB_0^T\tilde{{\varvec{w}}}+O_P(n^{-1/2}). \end{aligned}$$

Next we have

$$\begin{aligned} {\varvec{\phi }}({\varvec{x}})^T\hat{B}^T\hat{{\varvec{w}}}-{\varvec{\phi }}({\varvec{x}})^TB_0^T\tilde{{\varvec{w}}}= & {} {\varvec{\phi }}({\varvec{x}})^TB_0^T(\hat{{\varvec{w}}}-\tilde{{\varvec{w}}})+{\varvec{\phi }}({\varvec{x}})^T(\hat{B}-B_0)^T\hat{{\varvec{w}}} \\= & {} {\varvec{\phi }}({\varvec{x}})^T(\hat{B}-B_0)^T\hat{{\varvec{w}}}+O_P(n^{-1/2}). \end{aligned}$$

From (10) and Proposition 1, $\hat{{\varvec{w}}}-{\varvec{w}}_0=O_P(n^{-1/2})$ is satisfied. Therefore, we obtain

$$\begin{aligned} {\varvec{\phi }}({\varvec{x}})^T(\hat{B}-B_0)^T\hat{{\varvec{w}}}= & {} {\varvec{\phi }}({\varvec{x}})^T(\hat{B}-B_0)^T(\hat{{\varvec{w}}}-{\varvec{w}}_0)+{\varvec{\phi }}({\varvec{x}})^T(\hat{B}-B_0)^T{\varvec{w}}_0\\= & {} O_P(n^{-1})+O_P(n^{-1/2}). \end{aligned}$$

Consequently, we have $\hat{S}({\varvec{x}})-S({\varvec{x}})=O_P(n^{-1/2})$ and this completes the proof. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yoshida, T. Nonlinear surface regression with dimension reduction method. AStA Adv Stat Anal 101, 29–50 (2017). https://doi.org/10.1007/s10182-016-0271-2

Download citation

Received: 14 February 2015
Accepted: 26 May 2016
Published: 03 June 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s10182-016-0271-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Nonlinear surface regression with dimension reduction method

Abstract

Access this article

Similar content being viewed by others

The ridge method in a radial basis function neural network

Comparison of Radial Basis Functions

A Geometrical Approach for Parameter Selection of Radial Basis Functions Networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Examples of sufficient dimension reduction methods

Appendix 2: Proof of Proposition 1 and Theorem 1

Proof of Proposition 1

Proof (Proof of Theorem 1)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Nonlinear surface regression with dimension reduction method

Abstract

Access this article

Similar content being viewed by others

The ridge method in a radial basis function neural network

Comparison of Radial Basis Functions

A Geometrical Approach for Parameter Selection of Radial Basis Functions Networks

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: Examples of sufficient dimension reduction methods

Appendix 2: Proof of Proposition 1 and Theorem 1

Proof of Proposition 1

Proof (Proof of Theorem 1)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation