Skip to main content

Nonparametric Methods for Big Data Analytics

  • Chapter
  • First Online:
Book cover Handbook of Big Data Analytics

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

  • 4437 Accesses

Abstract

Nonparametric methods provide more flexible tools than parametric methods for modeling complex systems and discovering nonlinear patterns hidden in data. Traditional nonparametric methods are challenged by modern high dimensional data due to the curse of dimensionality. Over the past two decades, there have been rapid advances in nonparametrics to accommodate analysis of large-scale and high dimensional data. A variety of cutting-edge nonparametric methodologies, scalable algorithms, and the state-of-the-art computational tools have been designed for model estimation, variable selection, statistical inferences for high dimensional regression, and classification problems. This chapter provides an overview of recent advances on nonparametrics in big data analytics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Altman NS (1990) Kernel smoothing of data with correlated errors. J Am Stat Assoc 85:749–759

    Article  MathSciNet  Google Scholar 

  • Breiman L (1995) Better subset selection using the non-negative garrote. Technometrics 37:373–384

    Article  MathSciNet  Google Scholar 

  • Breiman L, Friedman JH (1985) Estimating optimal transformations for multiple regression and correlations (with discussion). J Am Stat Assoc 80:580–619

    Article  Google Scholar 

  • Breiman L, Spector P (1992) Subset selection and evaluation in regression: the X-random case. Int Stat Rev 60:291–319

    Article  Google Scholar 

  • Buja A, Hastie TJ, Tibshirani RJ (1989) Linear smoothers and additive models. Ann Stat 17:453–555

    Article  MathSciNet  Google Scholar 

  • Candes E, Tao T (2007) The Dantzig selector: statistical estimation when p is much larger than n. Ann Stat 35:2313–2351

    Article  MathSciNet  Google Scholar 

  • Chen J, Chen Z (2008) Extended Bayesian information criteria for model selection with large model space. Biometrika 95:759–771

    Article  MathSciNet  Google Scholar 

  • Chen S, Donoho DL, Saunders MA (1999) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20(1):33–61

    Article  MathSciNet  Google Scholar 

  • Cleveland W (1979) Robust locally weighted fitting and smoothing scatterplots. J Am Stat Assoc 74:829–836

    Article  Google Scholar 

  • Craven P, Wahba G (1979) Smoothing noise data with spline functions: estimating the correct degree of smoothing by the method of generalized cross validation. Numer Math 31:377–403

    Article  Google Scholar 

  • de Boor C (1978) A practical guide to splines. Springer, New York

    Book  Google Scholar 

  • Donoho D, Johnstone I (1994) Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425–455

    Article  MathSciNet  Google Scholar 

  • Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32:407–451

    Article  MathSciNet  Google Scholar 

  • Fan J, Fan Y (2008) High-dimensional classification using features annealed independence rules. Ann Stat 36:2605–2637

    Article  MathSciNet  Google Scholar 

  • Fan J, Gijbels I (1996) Local polynomial modeling and its applications. Chapman and Hall, Boca Raton

    MATH  Google Scholar 

  • Fan J, Jiang J (2005) Nonparametric inference for additive models. J Am Stat Assoc 100:890–907

    Article  MathSciNet  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle property. J Am Stat Assoc 96:1348–1360

    Article  Google Scholar 

  • Fan J, Li R (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc 96:1348–1360

    Article  MathSciNet  Google Scholar 

  • Fan J, Lv J (2008) Sure independence screening for ultrahigh dimensional feature space. J R Stat Soc B 70:849–911

    Article  MathSciNet  Google Scholar 

  • Fan J, Härdle W, Mammen E (1998) Direct estimation of additive and linear components for high-dimensional Data. Ann Stat 26:943–971

    Article  Google Scholar 

  • Fan J, Feng Y, Song R (2011) Nonparametric independence screening in sparse ultra-high-dimensional additive models. J Am Stat Assoc 106:544–557

    Article  MathSciNet  Google Scholar 

  • Friedman JH (1991) Multivariate adaptive regression splines (invited paper). Ann Stat 19:1–141

    Article  Google Scholar 

  • Friedman JH, Silverman BW (1989) Flexible parsimonious smoothing and additive modeling. Technometrics 31:3–39

    Article  MathSciNet  Google Scholar 

  • Friedman JH, Stuetzle W (1981) Projection pursuit regression. J Am Stat Assoc 76:817–823

    Article  MathSciNet  Google Scholar 

  • Green P, Silverman B (1994) Nonparametric regression and generalized linear models: a roughness penalty approach. Chapman & Hall, Boca Raton

    Book  Google Scholar 

  • Gu C (2002) Smoothing spline ANOVA models. Springer, Berlin

    Book  Google Scholar 

  • Hao N, Zhang HH (2014) Interaction screening for ultra-high dimensional data. J Am Stat Assoc 109:1285–1301

    Article  Google Scholar 

  • Hastie TJ, Tibshirani RJ (1990) Generalized additive models. Chapman and Hall, Boca Raton

    MATH  Google Scholar 

  • Huang J, Horovitz J, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38:2282–2313

    Article  MathSciNet  Google Scholar 

  • Kimeldorf G, Wahba G (1971) Some results on Tchebycheffian spline functions. J Math Anal Appl 33:82–95

    Article  MathSciNet  Google Scholar 

  • Lafferty J, Wasserman L (2008) RODEO: sparse, greedy nonparametric regression. Ann Stat 36:28–63

    Article  MathSciNet  Google Scholar 

  • Leng C, Zhang HH (2007) Nonparametric model selection in hazard regression. J Nonparametric Stat 18:417–429

    Article  Google Scholar 

  • Lin Y, Zhang HH (2006) Component selection and smoothing in multivariate nonparametric regression. Ann Stat 34:2272–2297

    Article  MathSciNet  Google Scholar 

  • Linton OB (1997) Efficient estimation of additive nonparametric regression models. Biometrika 84:469–473

    Article  MathSciNet  Google Scholar 

  • Mallet S (2008) A wavelet tour of signal processing: the sparse way. Elsevier, Burlington, MA

    Google Scholar 

  • Mammen E, van de Geer S (1997) Locally adaptive regression splines. Ann Stat 25:387–413

    Article  MathSciNet  Google Scholar 

  • Mammen E, Linton O, Nielsen J (1999) The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Ann Stat 27:1443–1490

    MathSciNet  MATH  Google Scholar 

  • Meier L, Van De Geer S, Buhlmann P (2009) High-dimensional additive modeling. Ann Stat 37:3779–3821

    Article  MathSciNet  Google Scholar 

  • Nadaraya E (1964) On estimating regression. Theory Probab Appl 9:141–142

    Article  Google Scholar 

  • Opsomer JD (2000) Asymptotic properties of backfitting estimators. J Multivar Anal 73:166–179

    Article  MathSciNet  Google Scholar 

  • Opsomer JD, Ruppert D (1998) A fully automated bandwidth selection method for fitting additive models. J Am Stat Assoc 93:605–619

    Article  MathSciNet  Google Scholar 

  • Ravikumar P, Liu H, Lafferty J, Wasserman L (2009) Sparse additive models. J R Stat Soc Ser B 71:1009–1030

    Article  MathSciNet  Google Scholar 

  • Schumaker L (1981) Spline functions: basic theory. Cambridge mathematical library. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  MathSciNet  Google Scholar 

  • Stone C, Buja A, Hastie T (1994) The use of polynomial splines and their tensor-products in multivariate function estimation. Ann Stat 22:118–184

    Article  MathSciNet  Google Scholar 

  • Stone C, Hansen M, Kooperberg C, Truong Y (1997) Polynomial splines and their tensor products in extended linear modeling. Ann Stat 25:1371–1425

    Article  MathSciNet  Google Scholar 

  • Storlie C, Bondell H, Reich B, Zhang HH (2011) The adaptive COSSO for nonparametric surface estimation and model selection. Stat Sin 21:679–705

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:147–169

    MathSciNet  MATH  Google Scholar 

  • Tsybakov AB (2009) Introduction to Nonparametric Estimation. Springer, New York

    Book  Google Scholar 

  • Wahba G (1983) Bayesian “confidence intervals” for the cross-validated smoothing spline. J R Stat Soc Ser B 45:133–150

    MathSciNet  MATH  Google Scholar 

  • Wahba G (1985) A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problems. Ann Stat 13:1378–1402

    Article  MathSciNet  Google Scholar 

  • Wahba G (1990) Spline models for observational data. In: SIAM CBMS-NSF regional conference series in applied mathematics, vol 59

    Google Scholar 

  • Wahba G, Wendelberger J (1980) Some new mathematical methods for variational objective analysis using splines and cross-validation. Mon Weather Rev 108:1122–1145

    Article  Google Scholar 

  • Wahba G, Wang Y, Gu C, Klein R, Klein B (1995) Smoothing spline ANOVA for exponential families, with application to the Wisconsin epidemiological study of diabetic retinopathy. Ann Stat 23:1865–1895

    Article  MathSciNet  Google Scholar 

  • Wand MP (1999) A central limit theorem for local polynomial backfitting estimators. J Multivar Anal 70:57–65

    Article  MathSciNet  Google Scholar 

  • Wang H (2009) Forward regression for ultra-high dimensional variable screening. J Am Stat Assoc 104:1512–1524

    Article  MathSciNet  Google Scholar 

  • Wood S (2006) Generalized additive models: an introduction with R. CRC Press, Boca Raton

    Book  Google Scholar 

  • Zhang HH (2006) Variable selection for support vector machines via smoothing spline ANOVA. Stat Sin 16:659–674

    MathSciNet  MATH  Google Scholar 

  • Zhang C-H (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942

    Article  MathSciNet  Google Scholar 

  • Zhang HH, Lin Y (2006) Component selection and smoothing for nonparametric regression in exponential families. Stat Sin 16:1021–1042

    MathSciNet  MATH  Google Scholar 

  • Zhang HH, Lu W (2007) Adaptive-LASSO for Cox’s proportional hazard model. Biometrika 94:691–703

    Article  MathSciNet  Google Scholar 

  • Zhang HH, Wahba G, Lin Y, Voelker M, Ferris M, Klein R, Klein B (2004) Variable selection and model building via likelihood basis pursuit. J Am Stat Assoc 99:659–672

    Article  MathSciNet  Google Scholar 

  • Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106:1099–1112

    Article  MathSciNet  Google Scholar 

  • Zhao P, Yu B (2006) On model selection of lasso. J Mach Learn Res 7:2541–2563

    MathSciNet  MATH  Google Scholar 

  • Zhu H, Yao F, Zhang HH (2014) Structured functional additive regression in reproducing kernel Hilbert spaces. J R Stat Soc B 76:581–603

    Article  MathSciNet  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  MathSciNet  Google Scholar 

  • Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320

    Article  MathSciNet  Google Scholar 

  • Zou H, Zhang HH (2009) On the adaptive elastic-net with a diverging number of parameters. Ann Stat 37:1733–1751

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hao Helen Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zhang, H.H. (2018). Nonparametric Methods for Big Data Analytics. In: Härdle, W., Lu, HS., Shen, X. (eds) Handbook of Big Data Analytics. Springer Handbooks of Computational Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-18284-1_5

Download citation

Publish with us

Policies and ethics