Abstract
This paper considers nonlinear regression analysis with a scalar response and multiple predictors. An unknown regression function is approximated by radial basis function models. The coefficients are estimated in the context of M-estimation. It is known that ordinary M-estimation leads to overfitting in nonlinear regression. The purpose of this paper is to construct a smooth estimator. The proposed method in this paper is conducted by a two-step procedure. First, the sufficient dimension reduction methods are applied to the response and radial basis functions for transforming the large number of radial bases to a small number of linear combinations of the radial bases without loss of information. In the second step, a multiple linear regression model between a response and the transformed radial bases is assumed and the ordinary M-estimation is applied. Thus, the final estimator is also obtained as a linear combination of radial bases. The validity and an asymptotic study of the proposed method are explored. A simulation and data example are addressed to confirm the behavior of the proposed method.
Similar content being viewed by others
References
Alimadad, A., Salibian-Barrera, M.: An outlier-robust fit for generalized additive models with applications to disease outbreak detection. J. Am. Statist. Assoc. 106, 719–731 (2011)
Buja, A., Hastie, T., Tibshirani, R.: Linear smoothers and additive models. Ann. Stat. 17, 453–555 (1989)
Chen, X., Zhou, C., Cook, R.D.: Coordinate-Independent sparse sufficient dimension reduction and variable selection. Ann. Stat. 38, 3696–3723 (2010)
Cook, R.D., Weisberg, S.: Discussion of ‘Sliced inverse regression for dimension reduction” by K.C. Li. J. Am. Stat. Assoc. 86, 328–332 (1991)
Cook, R.D.: Principle hessian directions revisited. J. Am. Stat. Assoc. 93, 84–100 (1998)
Cook, R.D., Li, B.: Dimension reduction for the conditional mean in regression. Ann. Stat. 30, 455–474 (2002)
de Boor, C.: A Practical Guide to Splines, revised edn. Springer, Berlin (2001)
Diaconis, P., Freedman, D.: Asymptotics of graphical projection pursuit. Ann. Stat. 12, 793–815 (1984)
Ganguli, B., Wand, M.P.: Feature significance in geostatistics. J. Comput. Gr. Stat. 13, 954–973 (2004)
Gu, C.: Smoothing Spline ANOVA Models. Springer, Berlin (2002)
Green, P.J., Silverman, B.W.: Nonparametric Regression and Generalized Linear Models. Chapman and Hall, London (1993)
Hall, P., Li, K.C.: On almost linearity of low dimensional projections from high dimensional data. Ann. Stat. 21, 867–889 (1993)
Hsing, T., Carroll, R.J.: An asymptotic theory for sliced inverse regression. Ann. Stat. 20, 623–1145 (1992)
Huber, P., Ronchetti, E.M.: Robust Statistics, 2nd edn. Wiley, Hoboken (2009)
Hubert, M., Rousseeuw, P.J., Branden, K.V.: Robpca: a new approach to robust principal component analysis. Technometrics 47, 64–79 (2005)
Koenker, R., Bassett, G.: Regression quantiles. Econometrica 46, 33–50 (1978)
Lee, T.C.M., Oh, H.: Robust penalized regression spline fitting with application to additive mixed modeling. Comput. Stat. 22, 159–171 (2007)
Li, K.C.: Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 102, 997–1008 (1991)
Li, K.C.: On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma. J. Am. Stat. Assoc. 87, 1025–1039 (1992)
Li, B., Zha, H., Chiaromonte, F.: Contour regression: a general approach to dimension reduction. Ann. Stat. 33, 1580–1616 (2005)
Li, B., Wang, S.: On directional regression for dimension reduction. J. Am. Stat. Assoc. 33, 1580–1616 (2007)
Li, L.: Sparse sufficient dimension reduction. Biometrika 94, 603–611 (2007)
Li, Y., Zhu, L.X.: Asymptotics for sliced average variance estimation. Ann. Stat 35, 41–69 (2007)
Li, F., Villani, M.: Efficient Bayesian multivariate surface regression. Scand. J. Stat 40, 706–723 (2013)
Nychka, D.W.: Spatial process estimates as smoothers. In: Schimek, M. (ed.) Smoothing and Regression. Springer, Heidelberg (2000)
Pollard, D.: Asymptotics for least absolute deviation regression estimators. Econom. Theory 7, 186–199 (1991)
Ruppert, D., Wand, M.P., Carroll, R.J.: Semiparametric Regression. Cambridge University Press, Cambridge (2003)
Wahba, G., Wang, Y., Gu, C., Klein, R., Klein, B.: Smoothing spline ANOVA for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. Ann. Stat. 23, 1865–1895 (1995)
Wong, R.K., Yao, F., Lee, T.C.: Robust estimation for generalized additive models. J. Comput. Gr. Stat. 23, 270–289 (2014)
Wood, S.N.: Thin plate regression splines. J. R. Stat. Soc. B. 65, 95–114 (2003)
Wood, S.N.: Generalized Additive Models. An Introduction With R. Chapman & Hall/CRC, Boca Raton (2006)
Xiao, L., Li, Y., Ruppert, D.: Fast bivariate \(P\)-splines: the sandwich smoother. J. R. Stat. Soc. B 75, 577–599 (2013)
Zhu, L., Miao, B., Peng, H.: On sliced inverse regression with high-dimensional covariates. J. Am. Stat. Assoc. 101, 630–643 (2006)
Acknowledgments
The authors are grateful to Editor, Associate Editor and two anonymous referees for their valuable comments and suggestions, which led to the improvement of the paper. The research of the author was partly supported by KAKENHI 26730019.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Examples of sufficient dimension reduction methods
We describe here the estimation algorithm to obtain \(\{\hat{{\varvec{\beta }}}_1,\ldots ,\hat{{\varvec{\beta }}}_J\}\) in Sect. 2. Several sufficient dimension reduction methods can be formulated as the eigenvalue problem:
where the K square matrix M is symmetric and nonnegative definite, the K square matrix G is positive definite, vectors \({\varvec{\beta }}_1,\ldots ,{\varvec{\beta }}_K\) are eigenvectors satisfying \({\varvec{\beta }}_i^TG{\varvec{\beta }}_j=1\) for \(i=j\), and 0 for otherwise, and \(\lambda _1\ge \cdots \ge \lambda _K\ge 0\) are the corresponding eigenvalues. Specifying M and G, several dimension reduction methods such as SIR, SAVE, PHD and others can be obtained. Let \(\varSigma =E[\{{\varvec{\phi }}(X)-E[\phi (X)]\}^2]\) and \({\varvec{Z}}=\varSigma ^{-1}({\varvec{\phi }}(X)-E[{\varvec{\phi }}(X)])\) be a covariance matrix of \({\varvec{\phi }}(X)\) and the standardized version of \({\varvec{\phi }}(X)\). When \(M=Cov(E[{\varvec{\phi }}(X)-E[{\varvec{\phi }}(X)]|y])\) and \(G=\varSigma \) are used, then (11) is SIR. The SAVE is corresponding to the case \(M=\varSigma ^{1/2}E[\{I-Cov({\varvec{Z}}|y)\}^2]\varSigma ^{1/2}\) and \(G=\varSigma \), where I is the identity matrix. If we want to use the PHD, M and G are selected as \(\varSigma ^{1/2}\varSigma _{yz}^2\varSigma ^{1/2}\) and \(G=\varSigma \), where \(\varSigma _{yz}=E[\{Y-E[Y]\}Z^T]\). The regularized version of SIR, SAVE and PHD using ridge penalty is obtained by modifying \(G=\varSigma +\gamma I\), where \(\gamma >0\) is the tuning parameter. We note that the tuning parameter selection should be practiced in the regularized version. Other methods are summarized by Li (2007) and Chen et al. (2010). The estimator \(\{\hat{{\varvec{\beta }}}_1,\ldots ,\hat{{\varvec{\beta }}}_J\}\) of \(\{{\varvec{\beta }}_1,\ldots ,{\varvec{\beta }}_J\}\) can be established using sample version of M and G, written as \(\hat{M}\) and \(\hat{G}\). The estimation algorithms have been clarified by Li (1991), Cook and Weisberg (1991), Li (1992) and Cook (1998).
Appendix 2: Proof of Proposition 1 and Theorem 1
Proof of Proposition 1
Let \(U_i=Y_i-{\varvec{\phi }}({\varvec{x}}_i)^TB_0^T\tilde{{\varvec{w}}}\), \(A_n=\sqrt{n}(B_0-\hat{B})\) and let
and \(Q_{n}({\varvec{t}})=\sum _{i=1}^nQ_{in}({\varvec{t}})\) Then the minimizer of \(Q_n\) can be easily shown to be
Actually, we have \(Q_n(\hat{{\varvec{t}}})=\sum _{i=1}^n \rho (Y_i-{\varvec{\phi }}({\varvec{x}}_i)^T\hat{B}^T\hat{{\varvec{w}}})-\rho (U_i)\). Let for \(i=1,\ldots ,n\),
Since \(\alpha _{in}({\varvec{t}})=O_P(n^{-1/2})\), we use the Taylor expansion \(Q_n({\varvec{t}})\) around \({\varvec{\alpha }}_{in}({\varvec{t}})=0\) and hence we obtain
Let \({\varvec{W}}_n=n^{-1/2}\sum _{i=1}^n \rho ^\prime (U_i){\varvec{\phi }}({\varvec{x}}_i)\). Then from Lyapunov’s central limit theorem, we see that \({\varvec{W}}_n\) has the asymptotic normality with mean \({\varvec{0}}\) and variance
To satisfy this, Assumption 3 (b) and (c) is necessary. Thus, we have \({\varvec{W}}_n=O_P(1)\). Writing \(\varGamma _n=n^{-1}\sum _{i=1}^n \rho ^{\prime \prime }(U_i){\varvec{\phi }}({\varvec{x}}_i){\varvec{\phi }}({\varvec{x}}_i)^T\), \(E[\varGamma _n]\) is bounded by Assumption 3 (d). We see that \(Q_n\) can be expressed as
From the study by Pollard (1991), the minimizer of \(Q_n\) is asymptotically equivalent to the minimizer of the leading term of the right-hand side of (12) since \(Q_n\) is convex and has a unique minimizer. Therefore, we have as \(n\rightarrow \infty \),
where
Consequently, \(\hat{{\varvec{t}}}=O_P(1)\) and this completes the proof. \(\square \)
Proof (Proof of Theorem 1)
First, from (10), the proposed estimator can be expressed as
Next we have
From (10) and Proposition 1, \(\hat{{\varvec{w}}}-{\varvec{w}}_0=O_P(n^{-1/2})\) is satisfied. Therefore, we obtain
Consequently, we have \(\hat{S}({\varvec{x}})-S({\varvec{x}})=O_P(n^{-1/2})\) and this completes the proof. \(\square \)
Rights and permissions
About this article
Cite this article
Yoshida, T. Nonlinear surface regression with dimension reduction method. AStA Adv Stat Anal 101, 29–50 (2017). https://doi.org/10.1007/s10182-016-0271-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-016-0271-2