Abstract
Nonparametric methods provide more flexible tools than parametric methods for modeling complex systems and discovering nonlinear patterns hidden in data. Traditional nonparametric methods are challenged by modern high dimensional data due to the curse of dimensionality. Over the past two decades, there have been rapid advances in nonparametrics to accommodate analysis of large-scale and high dimensional data. A variety of cutting-edge nonparametric methodologies, scalable algorithms, and the state-of-the-art computational tools have been designed for model estimation, variable selection, statistical inferences for high dimensional regression, and classification problems. This chapter provides an overview of recent advances on nonparametrics in big data analytics.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altman NS (1990) Kernel smoothing of data with correlated errors. J Am Stat Assoc 85:749–759
Breiman L (1995) Better subset selection using the non-negative garrote. Technometrics 37:373–384
Breiman L, Friedman JH (1985) Estimating optimal transformations for multiple regression and correlations (with discussion). J Am Stat Assoc 80:580–619
Breiman L, Spector P (1992) Subset selection and evaluation in regression: the X-random case. Int Stat Rev 60:291–319
Buja A, Hastie TJ, Tibshirani RJ (1989) Linear smoothers and additive models. Ann Stat 17:453–555
Candes E, Tao T (2007) The Dantzig selector: statistical estimation when p is much larger than n. Ann Stat 35:2313–2351
Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model space. Biometrika 95:759–771
Chen S, Donoho DL, Saunders MA (1999) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20(1):33–61
Cleveland W (1979) Robust locally weighted fitting and smoothing scatterplots. J Am Stat Assoc 74:829–836
Craven P, Wahba G (1979) Smoothing noise data with spline functions: estimating the correct degree of smoothing by the method of generalized cross validation. Numer Math 31:377–403
de Boor C (1978) A practical guide to splines. Springer, New York
Donoho D, Johnstone I (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425–455
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–451
Fan J, Fan Y (2008) High-dimensional classification using features annealed independence rules. Ann Stat 36:2605–2637
Fan J, Gijbels I (1996) Local polynomial modeling and its applications. Chapman and Hall, Boca Raton
Fan J, Jiang J (2005) Nonparametric inference for additive models. J Am Stat Assoc 100:890–907
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle property. J Am Stat Assoc 96:1348–1360
Fan J, Li R (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc 96:1348–1360
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc B 70:849–911
Fan J, Härdle W, Mammen E (1998) Direct estimation of additive and linear components for high-dimensional Data. Ann Stat 26:943–971
Fan J, Feng Y, Song R (2011) Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Am Stat Assoc 106:544–557
Friedman JH (1991) Multivariate adaptive regression splines (invited paper). Ann Stat 19:1–141
Friedman JH, Silverman BW (1989) Flexible parsimonious smoothing and additive modeling. Technometrics 31:3–39
Friedman JH, Stuetzle W (1981) Projection pursuit regression. J Am Stat Assoc 76:817–823
Green P, Silverman B (1994) Nonparametric regression and generalized linear models: a roughness penalty approach. Chapman & Hall, Boca Raton
Gu C (2002) Smoothing spline ANOVA models. Springer, Berlin
Hao N, Zhang HH (2014) Interaction screening for ultra-high dimensional data. J Am Stat Assoc 109:1285–1301
Hastie TJ, Tibshirani RJ (1990) Generalized additive models. Chapman and Hall, Boca Raton
Huang J, Horovitz J, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38:2282–2313
Kimeldorf G, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33:82–95
Lafferty J, Wasserman L (2008) RODEO: sparse, greedy nonparametric regression. Ann Stat 36:28–63
Leng C, Zhang HH (2007) Nonparametric model selection in hazard regression. J Nonparametric Stat 18:417–429
Lin Y, Zhang HH (2006) Component selection and smoothing in multivariate nonparametric regression. Ann Stat 34:2272–2297
Linton OB (1997) Efficient estimation of additive nonparametric regression models. Biometrika 84:469–473
Mallet S (2008) A wavelet tour of signal processing: the sparse way. Elsevier, Burlington, MA
Mammen E, van de Geer S (1997) Locally adaptive regression splines. Ann Stat 25:387–413
Mammen E, Linton O, Nielsen J (1999) The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Ann Stat 27:1443–1490
Meier L, Van De Geer S, Buhlmann P (2009) High-dimensional additive modeling. Ann Stat 37:3779–3821
Nadaraya E (1964) On estimating regression. Theory Probab Appl 9:141–142
Opsomer JD (2000) Asymptotic properties of backfitting estimators. J Multivar Anal 73:166–179
Opsomer JD, Ruppert D (1998) A fully automated bandwidth selection method for fitting additive models. J Am Stat Assoc 93:605–619
Ravikumar P, Liu H, Lafferty J, Wasserman L (2009) Sparse additive models. J R Stat Soc Ser B 71:1009–1030
Schumaker L (1981) Spline functions: basic theory. Cambridge mathematical library. Cambridge University Press, Cambridge
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Stone C, Buja A, Hastie T (1994) The use of polynomial splines and their tensor-products in multivariate function estimation. Ann Stat 22:118–184
Stone C, Hansen M, Kooperberg C, Truong Y (1997) Polynomial splines and their tensor products in extended linear modeling. Ann Stat 25:1371–1425
Storlie C, Bondell H, Reich B, Zhang HH (2011) The adaptive COSSO for nonparametric surface estimation and model selection. Stat Sin 21:679–705
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:147–169
Tsybakov AB (2009) Introduction to Nonparametric Estimation. Springer, New York
Wahba G (1983) Bayesian “confidence intervals” for the cross-validated smoothing spline. J R Stat Soc Ser B 45:133–150
Wahba G (1985) A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problems. Ann Stat 13:1378–1402
Wahba G (1990) Spline models for observational data. In: SIAM CBMS-NSF regional conference series in applied mathematics, vol 59
Wahba G, Wendelberger J (1980) Some new mathematical methods for variational objective analysis using splines and cross-validation. Mon Weather Rev 108:1122–1145
Wahba G, Wang Y, Gu C, Klein R, Klein B (1995) Smoothing spline ANOVA for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. Ann Stat 23:1865–1895
Wand MP (1999) A central limit theorem for local polynomial backfitting estimators. J Multivar Anal 70:57–65
Wang H (2009) Forward regression for ultra-high dimensional variable screening. J Am Stat Assoc 104:1512–1524
Wood S (2006) Generalized additive models: an introduction with R. CRC Press, Boca Raton
Zhang HH (2006) Variable selection for support vector machines via smoothing spline ANOVA. Stat Sin 16:659–674
Zhang C-H (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
Zhang HH, Lin Y (2006) Component selection and smoothing for nonparametric regression in exponential families. Stat Sin 16:1021–1042
Zhang HH, Lu W (2007) Adaptive-LASSO for Cox’s proportional hazard model. Biometrika 94:691–703
Zhang HH, Wahba G, Lin Y, Voelker M, Ferris M, Klein R, Klein B (2004) Variable selection and model building via likelihood basis pursuit. J Am Stat Assoc 99:659–672
Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106:1099–1112
Zhao P, Yu B (2006) On model selection of lasso. J Mach Learn Res 7:2541–2563
Zhu H, Yao F, Zhang HH (2014) Structured functional additive regression in reproducing kernel Hilbert spaces. J R Stat Soc B 76:581–603
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320
Zou H, Zhang HH (2009) On the adaptive elastic-net with a diverging number of parameters. Ann Stat 37:1733–1751
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Zhang, H.H. (2018). Nonparametric Methods for Big Data Analytics. In: Härdle, W., Lu, HS., Shen, X. (eds) Handbook of Big Data Analytics. Springer Handbooks of Computational Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-18284-1_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-18284-1_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18283-4
Online ISBN: 978-3-319-18284-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)