Nonparametric Methods for Big Data Analytics

Zhang, Hao Helen

doi:10.1007/978-3-319-18284-1_5

Hao Helen Zhang⁷

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

4437 Accesses

Abstract

Nonparametric methods provide more flexible tools than parametric methods for modeling complex systems and discovering nonlinear patterns hidden in data. Traditional nonparametric methods are challenged by modern high dimensional data due to the curse of dimensionality. Over the past two decades, there have been rapid advances in nonparametrics to accommodate analysis of large-scale and high dimensional data. A variety of cutting-edge nonparametric methodologies, scalable algorithms, and the state-of-the-art computational tools have been designed for model estimation, variable selection, statistical inferences for high dimensional regression, and classification problems. This chapter provides an overview of recent advances on nonparametrics in big data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Altman NS (1990) Kernel smoothing of data with correlated errors. J Am Stat Assoc 85:749–759
Article MathSciNet Google Scholar
Breiman L (1995) Better subset selection using the non-negative garrote. Technometrics 37:373–384
Article MathSciNet Google Scholar
Breiman L, Friedman JH (1985) Estimating optimal transformations for multiple regression and correlations (with discussion). J Am Stat Assoc 80:580–619
Article Google Scholar
Breiman L, Spector P (1992) Subset selection and evaluation in regression: the X-random case. Int Stat Rev 60:291–319
Article Google Scholar
Buja A, Hastie TJ, Tibshirani RJ (1989) Linear smoothers and additive models. Ann Stat 17:453–555
Article MathSciNet Google Scholar
Candes E, Tao T (2007) The Dantzig selector: statistical estimation when p is much larger than n. Ann Stat 35:2313–2351
Article MathSciNet Google Scholar
Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model space. Biometrika 95:759–771
Article MathSciNet Google Scholar
Chen S, Donoho DL, Saunders MA (1999) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20(1):33–61
Article MathSciNet Google Scholar
Cleveland W (1979) Robust locally weighted fitting and smoothing scatterplots. J Am Stat Assoc 74:829–836
Article Google Scholar
Craven P, Wahba G (1979) Smoothing noise data with spline functions: estimating the correct degree of smoothing by the method of generalized cross validation. Numer Math 31:377–403
Article Google Scholar
de Boor C (1978) A practical guide to splines. Springer, New York
Book Google Scholar
Donoho D, Johnstone I (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425–455
Article MathSciNet Google Scholar
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–451
Article MathSciNet Google Scholar
Fan J, Fan Y (2008) High-dimensional classification using features annealed independence rules. Ann Stat 36:2605–2637
Article MathSciNet Google Scholar
Fan J, Gijbels I (1996) Local polynomial modeling and its applications. Chapman and Hall, Boca Raton
MATH Google Scholar
Fan J, Jiang J (2005) Nonparametric inference for additive models. J Am Stat Assoc 100:890–907
Article MathSciNet Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle property. J Am Stat Assoc 96:1348–1360
Article Google Scholar
Fan J, Li R (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc 96:1348–1360
Article MathSciNet Google Scholar
Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc B 70:849–911
Article MathSciNet Google Scholar
Fan J, Härdle W, Mammen E (1998) Direct estimation of additive and linear components for high-dimensional Data. Ann Stat 26:943–971
Article Google Scholar
Fan J, Feng Y, Song R (2011) Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Am Stat Assoc 106:544–557
Article MathSciNet Google Scholar
Friedman JH (1991) Multivariate adaptive regression splines (invited paper). Ann Stat 19:1–141
Article Google Scholar
Friedman JH, Silverman BW (1989) Flexible parsimonious smoothing and additive modeling. Technometrics 31:3–39
Article MathSciNet Google Scholar
Friedman JH, Stuetzle W (1981) Projection pursuit regression. J Am Stat Assoc 76:817–823
Article MathSciNet Google Scholar
Green P, Silverman B (1994) Nonparametric regression and generalized linear models: a roughness penalty approach. Chapman & Hall, Boca Raton
Book Google Scholar
Gu C (2002) Smoothing spline ANOVA models. Springer, Berlin
Book Google Scholar
Hao N, Zhang HH (2014) Interaction screening for ultra-high dimensional data. J Am Stat Assoc 109:1285–1301
Article Google Scholar
Hastie TJ, Tibshirani RJ (1990) Generalized additive models. Chapman and Hall, Boca Raton
MATH Google Scholar
Huang J, Horovitz J, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38:2282–2313
Article MathSciNet Google Scholar
Kimeldorf G, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33:82–95
Article MathSciNet Google Scholar
Lafferty J, Wasserman L (2008) RODEO: sparse, greedy nonparametric regression. Ann Stat 36:28–63
Article MathSciNet Google Scholar
Leng C, Zhang HH (2007) Nonparametric model selection in hazard regression. J Nonparametric Stat 18:417–429
Article Google Scholar
Lin Y, Zhang HH (2006) Component selection and smoothing in multivariate nonparametric regression. Ann Stat 34:2272–2297
Article MathSciNet Google Scholar
Linton OB (1997) Efficient estimation of additive nonparametric regression models. Biometrika 84:469–473
Article MathSciNet Google Scholar
Mallet S (2008) A wavelet tour of signal processing: the sparse way. Elsevier, Burlington, MA
Google Scholar
Mammen E, van de Geer S (1997) Locally adaptive regression splines. Ann Stat 25:387–413
Article MathSciNet Google Scholar
Mammen E, Linton O, Nielsen J (1999) The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Ann Stat 27:1443–1490
MathSciNet MATH Google Scholar
Meier L, Van De Geer S, Buhlmann P (2009) High-dimensional additive modeling. Ann Stat 37:3779–3821
Article MathSciNet Google Scholar
Nadaraya E (1964) On estimating regression. Theory Probab Appl 9:141–142
Article Google Scholar
Opsomer JD (2000) Asymptotic properties of backfitting estimators. J Multivar Anal 73:166–179
Article MathSciNet Google Scholar
Opsomer JD, Ruppert D (1998) A fully automated bandwidth selection method for fitting additive models. J Am Stat Assoc 93:605–619
Article MathSciNet Google Scholar
Ravikumar P, Liu H, Lafferty J, Wasserman L (2009) Sparse additive models. J R Stat Soc Ser B 71:1009–1030
Article MathSciNet Google Scholar
Schumaker L (1981) Spline functions: basic theory. Cambridge mathematical library. Cambridge University Press, Cambridge
MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Article MathSciNet Google Scholar
Stone C, Buja A, Hastie T (1994) The use of polynomial splines and their tensor-products in multivariate function estimation. Ann Stat 22:118–184
Article MathSciNet Google Scholar
Stone C, Hansen M, Kooperberg C, Truong Y (1997) Polynomial splines and their tensor products in extended linear modeling. Ann Stat 25:1371–1425
Article MathSciNet Google Scholar
Storlie C, Bondell H, Reich B, Zhang HH (2011) The adaptive COSSO for nonparametric surface estimation and model selection. Stat Sin 21:679–705
Article Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:147–169
MathSciNet MATH Google Scholar
Tsybakov AB (2009) Introduction to Nonparametric Estimation. Springer, New York
Book Google Scholar
Wahba G (1983) Bayesian “confidence intervals” for the cross-validated smoothing spline. J R Stat Soc Ser B 45:133–150
MathSciNet MATH Google Scholar
Wahba G (1985) A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problems. Ann Stat 13:1378–1402
Article MathSciNet Google Scholar
Wahba G (1990) Spline models for observational data. In: SIAM CBMS-NSF regional conference series in applied mathematics, vol 59
Google Scholar
Wahba G, Wendelberger J (1980) Some new mathematical methods for variational objective analysis using splines and cross-validation. Mon Weather Rev 108:1122–1145
Article Google Scholar
Wahba G, Wang Y, Gu C, Klein R, Klein B (1995) Smoothing spline ANOVA for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. Ann Stat 23:1865–1895
Article MathSciNet Google Scholar
Wand MP (1999) A central limit theorem for local polynomial backfitting estimators. J Multivar Anal 70:57–65
Article MathSciNet Google Scholar
Wang H (2009) Forward regression for ultra-high dimensional variable screening. J Am Stat Assoc 104:1512–1524
Article MathSciNet Google Scholar
Wood S (2006) Generalized additive models: an introduction with R. CRC Press, Boca Raton
Book Google Scholar
Zhang HH (2006) Variable selection for support vector machines via smoothing spline ANOVA. Stat Sin 16:659–674
MathSciNet MATH Google Scholar
Zhang C-H (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
Article MathSciNet Google Scholar
Zhang HH, Lin Y (2006) Component selection and smoothing for nonparametric regression in exponential families. Stat Sin 16:1021–1042
MathSciNet MATH Google Scholar
Zhang HH, Lu W (2007) Adaptive-LASSO for Cox’s proportional hazard model. Biometrika 94:691–703
Article MathSciNet Google Scholar
Zhang HH, Wahba G, Lin Y, Voelker M, Ferris M, Klein R, Klein B (2004) Variable selection and model building via likelihood basis pursuit. J Am Stat Assoc 99:659–672
Article MathSciNet Google Scholar
Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106:1099–1112
Article MathSciNet Google Scholar
Zhao P, Yu B (2006) On model selection of lasso. J Mach Learn Res 7:2541–2563
MathSciNet MATH Google Scholar
Zhu H, Yao F, Zhang HH (2014) Structured functional additive regression in reproducing kernel Hilbert spaces. J R Stat Soc B 76:581–603
Article MathSciNet Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Article MathSciNet Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320
Article MathSciNet Google Scholar
Zou H, Zhang HH (2009) On the adaptive elastic-net with a diverging number of parameters. Ann Stat 37:1733–1751
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, ENR2 S323, University of Arizona, Tucson, AZ, USA
Hao Helen Zhang

Authors

Hao Helen Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hao Helen Zhang .

Editor information

Editors and Affiliations

Ladislaus von Bortkiewicz Chair of Statistics, C.A.S.E. Center for Applied Statistics & Economics, Humboldt-Universität zu Berlin, Berlin, Germany
Wolfgang Karl Härdle
Institute of Statistics, National Chiao Tung University, Hsinchu, Taiwan
Henry Horng-Shing Lu
School of Statistics, University of Minnesota, Minneapolis, USA
Xiaotong Shen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Zhang, H.H. (2018). Nonparametric Methods for Big Data Analytics. In: Härdle, W., Lu, HS., Shen, X. (eds) Handbook of Big Data Analytics. Springer Handbooks of Computational Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-18284-1_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-18284-1_5
Published: 18 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18283-4
Online ISBN: 978-3-319-18284-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics