Advertisement

Metrika

, Volume 76, Issue 7, pp 887–908 | Cite as

Variable selection for high-dimensional varying coefficient partially linear models via nonconcave penalty

  • Zhaoping Hong
  • Yuao Hu
  • Heng LianEmail author
Article

Abstract

In this paper, we consider the problem of simultaneous variable selection and estimation for varying-coefficient partially linear models in a “small \(n\), large \(p\)” setting, when the number of coefficients in the linear part diverges with sample size while the number of varying coefficients is fixed. Similar problem has been considered in Lam and Fan (Ann Stat 36(5):2232–2260, 2008) based on kernel estimates for the nonparametric part, in which no variable selection was investigated besides that \(p\) was assume to be smaller than \(n\). Here we use polynomial spline to approximate the nonparametric coefficients which is more computationally expedient, demonstrate the convergence rates as well as asymptotic normality of the linear coefficients, and further present the oracle property of the SCAD-penalized estimator which works for \(p\) almost as large as \(\exp \{n^{1/2}\}\) under mild assumptions. Monte Carlo studies and real data analysis are presented to demonstrate the finite sample behavior of the proposed estimator. Our theoretical and empirical investigations are actually carried out for the generalized varying-coefficient partially linear models, including both Gaussian data and binary data as special cases.

Keywords

Bayesian information criterion Cross-validation SCAD penalty. 

Notes

Acknowledgments

The authors sincerely thank the two referees for their insightful comments and suggestions that have lead to improvements on the original manuscript. The research of Heng Lian is supported by Singapore MOE Tier 1 Grant.

References

  1. Cai Z, Fan J, Li R (2000) Efficient estimation and inferences for varying-coefficient models. J Am Stat Assoc 95(451):941–956MathSciNetCrossRefzbMATHGoogle Scholar
  2. Chiang CT, Rice JA, Wu C (2001) Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. J Am Stat Assoc 96(454):605–619MathSciNetCrossRefzbMATHGoogle Scholar
  3. Chiaretti S, Li X, Gentleman R, Vitale A, Vignetti M, Mandelli F, Ritz J, Foa R (2004) Gene expression profile of adult T-cell acute lymphocytic leukemia identifies distinct subsets of patients with different response to therapy and survival. Blood 103(7):2771–2778CrossRefGoogle Scholar
  4. De Boor C (2001) A practical guide to splines. Springer, New York, rev. edition (2001)Google Scholar
  5. Eubank RL, Huang C, Maldonado YM, Wang N, Wang S, Buchanan RJ (2004) Smoothing spline estimation in varying-coefficient models. J R Stat Soc Ser B Stat Methodol 66:653–667MathSciNetCrossRefzbMATHGoogle Scholar
  6. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96(456):1348–1360MathSciNetCrossRefzbMATHGoogle Scholar
  7. Fan J, Lv J (2011) Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans Inf Theory 57:5467–5484MathSciNetCrossRefGoogle Scholar
  8. Fan J, Peng H (2004) Nonconcave penalized likelihood with a diverging number of parameters. Ann Stat 32(3):928–961MathSciNetCrossRefzbMATHGoogle Scholar
  9. Fan J, Zhang W (1999) Statistical estimation in varying coefficient models. Ann Stat 27(5):1491–1518MathSciNetCrossRefzbMATHGoogle Scholar
  10. Fan J, Zhang J (2000) Two-step estimation of functional linear models with applications to longitudinal data. J R Stat Soc Ser B Stat Methodol 62:303–322MathSciNetCrossRefGoogle Scholar
  11. Fan J, Feng Y, Song R (2011) Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Am Stat Assoc 106:544–557MathSciNetCrossRefzbMATHGoogle Scholar
  12. Frank I, Friedman J (1993) A statistical view of some chemometrics regression tools. Technometrics 35: 109–135Google Scholar
  13. Hastie T, Tibshirani R (1993) Varying-coefficient models. J R Stat Soc Ser B Methodol 55(4):757–796MathSciNetzbMATHGoogle Scholar
  14. Huang JZ, Wu C, Zhou L (2002) Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika 89(1):111–128MathSciNetCrossRefzbMATHGoogle Scholar
  15. Huang JZ, Wu C, Zhou L (2004) Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Stat Sin 14(3):763–788MathSciNetzbMATHGoogle Scholar
  16. Huang J, Horowitz J, Ma S (2008) Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann Stat 36(2):587–613MathSciNetCrossRefzbMATHGoogle Scholar
  17. Huang J, Horowitz J, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38(4):2282–2313MathSciNetCrossRefzbMATHGoogle Scholar
  18. Kim Y, Choi H, Oh H (2008) Smoothly clipped absolute deviation on high dimensions. J Am Stat Assoc 103(484):1665–1673MathSciNetCrossRefGoogle Scholar
  19. Lam C, Fan J (2008) Profile-kernel likelihood inference with diverging number of parameters. Ann Stat 36(5):2232–2260MathSciNetCrossRefzbMATHGoogle Scholar
  20. Li R, Liang H (2008) Variable selection in semiparametric regression modeling. Ann Stat 36(1):261–286MathSciNetCrossRefzbMATHGoogle Scholar
  21. McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hall, London, New YorkGoogle Scholar
  22. Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B Methodol 58(1):267–288MathSciNetzbMATHGoogle Scholar
  23. van der Geer SA (2000) Applications of empirical process theory. Cambridge University Press, CambridgezbMATHGoogle Scholar
  24. Wang H, Xia Y (2009) Shrinkage estimation of the varying coefficient model. J Am Stat Assoc 104(486):747–757MathSciNetCrossRefGoogle Scholar
  25. Wang L, Li H, Huang JZ (2008) Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J Am Stat Assoc 103(484):1556–1569MathSciNetCrossRefGoogle Scholar
  26. Wang L, Wu Y, Li R (2012) Quantile regression for analyzing heterogeneity in ultra-high dimension. J Am Stat Assoc 107(497):214–222MathSciNetCrossRefzbMATHGoogle Scholar
  27. Wang L, Liu X, Liang H, Carroll R (2011) Estimation and variable selection for generalized additive partially linear models. Ann Stat 39:1827–1851Google Scholar
  28. Wei F, Huang J, Li H (2011) Variable selection in high-dimensional varying-coefficient models. Stat Sin 21:1515–1540Google Scholar
  29. Xie H, Huang J (2009) SCAD-penalized regression in high-dimensional partially linear models. Ann Stat 37(2):673–696MathSciNetCrossRefzbMATHGoogle Scholar
  30. Yuan M, Lin Y (2007) On the non-negative garrotte estimator. J R Stat Soc Ser B Stat Methodol 69:143–161MathSciNetCrossRefzbMATHGoogle Scholar
  31. Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38(2):894–942CrossRefzbMATHGoogle Scholar
  32. Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429CrossRefzbMATHGoogle Scholar
  33. Zou H, Li R (2008) One-step sparse estimates in nonconcave penalized likelihood models. Ann Stat 36(4):1509–1533MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  1. 1.Division of Mathematical Sciences, School of Physical and Mathematical SciencesNanyang Technological UniversitySingaporeSingapore

Personalised recommendations