Regularization in statistics

Bickel, Peter J.; Li, Bo; Tsybakov, Alexandre B.; van de Geer, Sara A.; Yu, Bin; Valdés, Teófilo; Rivero, Carlos; Fan, Jianqing; van der Vaart, Aad

doi:10.1007/BF02607055

Regularization in statistics

Published: September 2006

Volume 15, pages 271–344, (2006)
Cite this article

Test Aims and scope Submit manuscript

Peter J. Bickel¹,
Bo Li²,
Alexandre B. Tsybakov³,
Sara A. van de Geer⁴,
Bin Yu¹,
Teófilo Valdés⁵,
Carlos Rivero⁵,
Jianqing Fan⁶ &
…
Aad van der Vaart⁷

2556 Accesses
133 Citations
6 Altmetric
Explore all metrics

Abstract

This paper is a selective review of the regularization methods scattered in statistics literature. We introduce a general conceptual approach to regularization and fit most existing methods into it. We have tried to focus on the importance of regularization when dealing with today's high-dimensional objects: data and models. A wide range of examples are discussed, including nonparametric regression, boosting, covariance matrix estimation, principal component estimation, subsampling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust Methods for High-Dimensional Regression and Covariance Matrix Estimation

Regularization: From Inverse Problems to Large-Scale Machine Learning

Introduction

References

Akaike, H. (1970). Statistical predictor identification.Annals of the Institute of Statistical Mathematics, 22:203–217.
Article MATH MathSciNet Google Scholar
Bair, E., Hastie, T. J., Paul, D., andTibshirani, R. (2006). Prediction by supervised principal components.Journal of the American Statistical Association, 101(473):119–137.
Article MathSciNet MATH Google Scholar
Bickel, P. J., Götze, F., andvan Zwet, W. R. (1997). Resampling fewer thann observations: gains, losses, and remedies for losses.Statistica Sinica, 7(1):1–31. Empirical Bayes, sequential analysis and related topics in statistics and probability (New Brunswick, NJ, 1995).
MATH MathSciNet Google Scholar
Bickel, P. J., Klaassen, C. A. J., Ritov, Y., andWellner, J. A. (1998).Efficient and adaptive estimation for semiparametric models.Reprint of the 1993 original. Springer-Verlag, New York.
MATH Google Scholar
Bickel, P. J. andLevina, E. (2004). Some theory of Fisher's linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations.Bernoulli, 10(6):989–1010.
Article MATH MathSciNet Google Scholar
Bickel, P. J. andLevina, E. (2006). Regularized estimation of large covariance matrices. Technical Report 716, Department of Statistics, University of California, Berkeley, CA.
Google Scholar
Bickel, P. J., Ritov, Y., andZakai, A. (2006). Some theory for generalized boosting algorithms.Journal of Machine Learning Research. To appear.
Bickel, P. J. andSakov, A. (2005). On the choice ofm in them out ofn bootstrap and its application to confidence bounds for extreme percentiles. Unpublished.
Birgé, L. andMassart, P. (1997). From model selection to adaptive estimation. In D. Pollard, E. Torgessen, and G. Yang, eds.,A Festschrift for Lucien Le Cam: Research papers in Probability and Statistics, pp. 55–87 Springer-Verlag, New York.
Google Scholar
Birgé, L. andMassart, P. (2001). Gaussian model selection.Journal of the European Mathematical Society, 3(3):203–268.
Article MATH MathSciNet Google Scholar
Böttcher, A. andSilbermann, B. (1999).Introduction to large truncated Toeplitz matrices, Universitext. Springer-Verlag, New York.
MATH Google Scholar
Breiman, L. (1996). Heuristics of instability and stabilization in model selection.The Annals of Statistics, 24(6):2350–2383.
Article MATH MathSciNet Google Scholar
Breiman, L., Stone, C. J., andKooperberg, C. (1990). Robust confidence bounds for extreme upper quantiles.Journal of Statistical Computation and Simulation, 37(3–4):127–149.
MATH MathSciNet Google Scholar
Bühlmann, P. (2006). Boosting for high-dimensional linear models.The Annals of Statistics, 34(2):559–583.
Article MATH MathSciNet Google Scholar
Bühlmann, P. andYu, B. (2006). Sparse boosting.Journal of Machine Learning Research, 7:1001–1024.
Google Scholar
Bunea, F., Wegkamp, M. H., andAuguste, A. (2006). Consistent variable selection in high dimensional regression via multiple testing.Journal ofStatistical Planning and Inference, 136(12):4349–4364.
Article MATH MathSciNet Google Scholar
Chen, H. (1988). Convergence rates for parametric components in a partly linear model.The Annals of Statistics, 16(1):136–146.
MATH MathSciNet Google Scholar
Craven, P., andWahba, G. (1979). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation.Numerische Mathematik 31(4): 377–403.
Article MATH MathSciNet Google Scholar
Daniels, M. J., andPourahmadi, M. (2002). Bayesian analysis of covariance matrices and dynamic models for longitudinal data.Biometrika, 89(3):553–566.
Article MATH MathSciNet Google Scholar
Datta, S., andMcCormick, W. P. (1995). Bootstrap inference for a firstorder autoregression with positive innovations.Journal of the American Statistical Association, 90(432):1289–1300
Article MATH MathSciNet Google Scholar
Devroye, L., Györfi, L., andLugosi, G. (1996),A probabilistic theory of pattern recognition, Vol. 31 ofApplications of Mathematics (New York). Springer-Verlag, New York.
Google Scholar
Donoho, D. L. (2000). High dimensional data analysis: The curses and blessings of dimensionality. InMath Challenges of 21st Centuary (2000). American Mathematical Society. Plenary speaker. Available in: http: //www-stat.stanford.edu/donoho/Lectures/AMS2000/
Donoho, D. L., andJohnstone, I. M. (1998). Minimax estimation via wavelet shrinkage.The Annals of Statistics, 26(3):879–921
Article MATH MathSciNet Google Scholar
Draper, N. R., andSmith, H. (1998).Applied regression analysis. Wiley Series in Probability and Statistics: Texts and References Section, John Wiley & Sons, New York, 3rd ed.
MATH Google Scholar
Dudoit, S., Fridlyand, J., andSpeed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data.Journal of the American Statistical Association, 97(457):77–87.
Article MATH MathSciNet Google Scholar
Dudoit, S., andVan der Laan, M. J. (2005). Asymptotics of crossvalidaed risk estimation in estimator selection and performance assessment.Statistical Methodology, 2(2):131–154.
Article MathSciNet Google Scholar
Efron, B. (1979). Bootstrap methods: another look at the jackknife.The annals of Statistics, 7(1):1–26.
MATH MathSciNet Google Scholar
Efron, B. (2004). The estimation of prediction error: covariance penalties and cross-validation (with discussions).Journal of the American Statistical Association, 99(467):619–642.
Article MathSciNet MATH Google Scholar
Efron, B., Hastie, T. J., Johnstone, I., andTibshirani, R. (2004). Least angle regression (with discussions).The Annals of Statistics, 32(2):407–499.
Article MATH MathSciNet Google Scholar
Fan J., andGijbels, I. (1996).Local polynomial modelling and its applications, Vol. 66 ofMonographs on Statistics and Applied Probability. Chapman & Hall/CRC, London
MATH Google Scholar
Fan, J., andLi, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties.Journal of the American Statistical Assocition, 96(456):1348–1360.
Article MATH MathSciNet Google Scholar
Fan, J. andLi, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In: M. Sanz-Sole, J. Soria, J. L. Varona, and J. Verdera, eds.Proceedings of the International Congress of Mathematicians, Madrid 2006, Vol. III, pp 595–622, European Mathematical Society Publishing House.
Fan, J., andPeng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters.The Annals of Statistics, 32(3):928–961.
Article MATH MathSciNet Google Scholar
Furrer, R. andBengtsson, T. (2006). Estimation of high-dimensional prior and posteriori covariance matrices in Kalman filter variants.Journal of Multivariate Analysis. To appear.
Götze, F. (1993). Asymptotic approximation and the bootstrap.I.M.S. Bulletin, p. 305.
Götze, F., andRaĉkauskas, A. (2001). Adaptive choice of bootstrap sample sizes. InState of the art in probability and statistics (Leiden, 1999), Vol 36 ofIMS Lecture Notes Monograph Series, pp. 286–309. Institute of Mathematical Statitics, Beachwood, OH
Google Scholar
Greenshtein, E. (2006). Best subset selection, persistence in high-dimensional statistical learning and optimization under ℓ₁-constraintThe Annals of Statistics 34(5), To appear.
Greenshtein, E., andRitov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization.Bernoulli, 10(6):971–988.
MATH MathSciNet Google Scholar
Györfi, L., Kohler, M., Krzyzak, A., andWalk, H. (2002).A distribution-free theory of nonparametric regression, Springer Series in Statistics. Springer-Verlag, New York.
MATH Google Scholar
Hall, P. (1992).The bootstrap and Edgeworth expansion. Springer Series in Statistics, Springer-Verlag, New York.
Google Scholar
Hall, P., Horowitz, J. L., andJing, B.-Y. (1995). On blocking rules for the bootstrap with dependent data.Biometrika, 82(3):561–574.
Article MATH MathSciNet Google Scholar
Hastie, T. J., Tibshirani, R., andFriedman, J. H. (2001).The elements of statistical learning. Springer Series in Statistics. Springer-Verlag, New York. Data mining, inference, and prediction.
MATH Google Scholar
Hoerl, A. E. andKennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems.Technometrics, 12(1):55–67.
Article MATH MathSciNet Google Scholar
Huang, J., Liu, N., Pourahmadi, M., andLiu, L. (2006). Covariance matrix selection and estimation via penalise normal likelihood.Biometrika, 93(1):85–98.
Article MathSciNet MATH Google Scholar
Hunter, D. R. andLi, R. (2005). Variable selection using MM algorithms.The Annals of Statistics, 33(4):1617–1642.
Article MATH MathSciNet Google Scholar
James, W. andStein, C. (1961). Estimation with quadratic loss. InProceedings of the 4th Berkeley Sympos. Math. Statist. and Probability, Vol. I, pp. 361–379. Univ. California Press, Berkeley, Calif.
Google Scholar
Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis.The Annals of Statistics,29(2):295–327.
Article MATH MathSciNet Google Scholar
Johnstone, I. M. andLu, A. Y. (2006). Sparse principle component analysis.Journal of the American Statistical Association. To appear.
Johnstone, I. M. andSilverman, B. W. (2005). Empirical Bayes selection of wavelet thresholds.The Annals of Statistics, 33(4):1700–1752.
Article MATH MathSciNet Google Scholar
Kass, R. E., andRaftery, A. E. (1995). Bayes factors.Journal of the American Statistical Association, 90(430):773–795.
Article MATH Google Scholar
Kass, R. E. andWasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion.Journal of the American Statistical Association, 90(431):928–934.
Article MATH MathSciNet Google Scholar
Kosorok, M. andMa, S. (2006). Marginal asymptotics for the “large p, small n” paradigm: with applications to microarray data. Unpublished.
Künsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations.The Annals of Statistics, 17(3):1217–1241.
MATH MathSciNet Google Scholar
Ledoit, O., andWolf, M. (2004). A well-conditioned estimator for large-dimensional coveriance matrices.Journal of Multivariate Analysis, 88(2):365–411.
Article MATH MathSciNet Google Scholar
Li, K.-C. (1985). From Stein's unbiased risk estimates to the method of generalized cross validation.The Annals of Statistics, 13(4):1352–1377.
MATH MathSciNet Google Scholar
Li, K.-C. (1986). Asymptotic optimality ofC _L and generalized cross-validation in ridge regression with application to spline smoothing.The Annals of Statistics, 14(3):1101–1112.
MATH MathSciNet Google Scholar
Li, K.-C. (1987). Asymptotic optimality forC _p,C _L, cross-validation and generalized cross-validation: discrete index set.The Annals of Statistics, 15(3):958–975.
MATH MathSciNet Google Scholar
Lugosi, G. andNobel, A. B. (1999). Adaptive model selection using empirical complexities.The Annals of Statistics, 27(6):1830–1864.
Article MATH MathSciNet Google Scholar
Lugosi, G., andVayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods.The Annals of Statistics, 32(1):300–55.
Google Scholar
Mallows, C. L. (1973). Some comments onc _p.Technometrics, 15(4):661–675.
Article MATH Google Scholar
Mammen, E. (1992).When Does Bootstrap Work?, Springer-Verlag, New York.
Google Scholar
Mammen, E. andTsybakov, A. B. (1999). Smooth discrimination analysis.The Annals of Statistics, 27(6):1808–1829.
Article MATH MathSciNet Google Scholar
Meinshausen, N. (2005). Lasso with relaxation. Unpublished.
Nadaraya, E. A. (1964). On estimating regression.Theory of Probability and Its Applications, 10:186–190.
Article Google Scholar
Parzen, E. (1962). On estimation of a probability density function and mode.The Annals of Mathematical Statistics, 33:1065–1076.
MathSciNet MATH Google Scholar
Paul, D. (2005). Asymptotics of the leading sample eigenvalues for a spiked covariance model. Unpublished.
Politis, D. N. andRomano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions.The Annals of Statistics, 22(4):2031–2050.
MATH MathSciNet Google Scholar
Politis, D. N. Romano, J. P., andWolf, M. (1999).Subsampling. Springer Series in Statistics. Springer-Verlag, New York.
MATH Google Scholar
Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation.Biometrika, 86(3):677–690.
Article MATH MathSciNet Google Scholar
Pourahmadi, M. (2000). Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix.Biometrika, 87(2):425–435.
Article MATH MathSciNet Google Scholar
Rissanen, J. (1984). Universal coding, information, prediction, and estimationInstitute of Electrical and Electronics Engineers. Transactions on Information Theory, 30(4):629–636.
Article MATH MathSciNet Google Scholar
Robert, C. P. andCasella, G. (2004).Monte Carlo statistical methods. Springer Texts in Statistics. Springer-Verlag, New York, 2nd ed.
MATH Google Scholar
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function.The Annals of Mathematical Statistics, 27:832–837.
MathSciNet MATH Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464.
MATH MathSciNet Google Scholar
Shao, J. (1997). An asymptotic theory for linear model selection (with discussion).Statistica Sinica, 7(2):221–264.
MATH MathSciNet Google Scholar
Smith, M. andKohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data.Journal of the American Statistical Association, 97(460):1141–1153.
Article MATH MathSciNet Google Scholar
Stone, C. J., Hansen, M. H., Kooperberg, C., andTruong, Y. K. (1997). Polynomial splines and their tensor products in extended linear modeling (with discussions).The Annals of Statistics, 25(4):1371–1470.
Article MATH MathSciNet Google Scholar
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussions).Journal of the Royal Statistical Society. Series B, 36:111–147.
MATH Google Scholar
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society. Series B, 58(1):267–288.
MATH MathSciNet Google Scholar
Tikhonov, A. N. (1943). On the stability of inverse problems.C. R. (Doklady) Acad. Sci. URSS (N.S.), 39:176–179.
MathSciNet Google Scholar
Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning.The Annals of Statistics, 32(1):135–166.
Article MATH MathSciNet Google Scholar
Vapnik, V. N. (1998).Statistical learning theory Adaptive and Learning Systems for Signal Processing, Communications, and Control. John Wiley & Sons, New York. A Wiley-Interscience Publication.
MATH Google Scholar
Wachter, K. W. (1978). The strong limits of random matrix spectra for sample matrices of independent elements.The Annals of Probability, 6(1):1–18.
MATH MathSciNet Google Scholar
Wang, Y. (2004). Model selection. InHandbook of computational statistics, pp. 437–466. Springer-Verlag. Berlin.
Google Scholar
Watson, G. S. (1964). Smooth regression analysis.Sankhyā. Series A, 26:359–372.
MATH Google Scholar
Wigner, E. P. (1955). Characteristic vectors of bordered matrices with infinite dimensions.Annals of Mathematics. Second Series, 62:548–564.
MathSciNet Google Scholar
Wu, W. B. andPourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data.Biometrika, 90(4):831–844.
Article MathSciNet Google Scholar
Zhang, H. H., Wahba, G., Lin, Y., Voelker, M., Ferris, M., Klein, R., andKlein, B. (2004). Variable selection and model building via likelihood basis pursuit.Journal of the American Statistical Association, 99(467):659–672.
Article MathSciNet MATH Google Scholar
Zhang, T. andYu, B. (2005). Boosting with early stopping: convergence and consistency.The Annals of Statistics, 33(4):1538–1579.
Article MATH MathSciNet Google Scholar
Zou, H. andHastie, T. J. (2005). Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society. Series B, 67(2):301–320.
Article MATH MathSciNet Google Scholar

Additional references

Barron, A., Cohen, A., Dahmen, W., andDeVore, R. (2005). Approximation and learning by greedy algorithms. Manuscript.
Bunea, F., Tsybakov, A. B., andWegkamp, M. H. (2005). Aggregation for gaussian regression.The Annals of Statistics. Tentatively accepted.
Bunea, F., Tsybakov, A. B., andWegkamp, M. H. (2006). Aggregation and sparsity via ℓ₁ penalized least squares. In H. U. Simon and G. Lugosi, eds.,Proceedings of 19th Annual Conference on Learning Theory (COLT 2006), Vol. 4005 ofLecture Notes in Artificial Intelligence, pp. 379–391. Springer-Verlag, Berlin-Heidelberg.
Google Scholar
Juditsky, A., Nazin, A., Tsybakov, A., andVayatis, N. (2005a). Recursive aggregation of estimators by mirror descent algorithm with averaging.Problems of Information Transmission, 41(4):368–384.
Article MATH MathSciNet Google Scholar
Juditsky, A., Rigollet, P., andTsybakov, A. B. (2005b). Learning by mirror averaging. Preprint, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6—Paris 7. https://hal.ccsd.cnrs.fr/ccsd-00014097.
Klemelä, J. (2006). Density estimation with stagewise optimization of the empirical risk. Manuscript.
Mannor, S., Meir, R. andZhang, T. (2003). Greedy algorithms for classification—consistency, convergence rates, and adaptivity.Journal of Machine Learning Research, 4:713–742.
Article MathSciNet Google Scholar
Mason, L., Baxter, J., Bartlett, P. L., andFrean, M. (2000). Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, eds.,Advances in Large Margin Classifiers, pp. 221–247. MIT Press, Cambridge, MA.
Google Scholar
Tsybakov, A. B. (2003). Optimal rates of aggregation. In B. Schölkopf and M. Warmuth, eds.,Proceedings of 16th Annual Conference on Learning Theory (COLT 2003) and 7th Annual Workshop on Kernel Machines, Vol. 2777 ofLecture Notes in Artificial Intelligence, pp. 303–313. Springer-Verlag, Berlin-Heidelberg.
Google Scholar

Additional references

Boucheron, S., Bousquet, O., andLugosi, G. (2005). Theory of classification: a survey of some recent advances.ESAIM. Probability and Statistics, 9:323–375 (electronic).
MATH MathSciNet Google Scholar
Koltchinskii, V. (2006). 2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization.The Annals of Statistics, 34(6). To appear.

Additional references

Bousquet, O. andElisseeff, A. (2002). Stability and generalization.Journal of Machine Learning Research, 2(3):499–526.
Article MATH MathSciNet Google Scholar
Kutin, S., andNiyogi, P. (2002). Almost-everywhere algorithmic stability and genearalization error. Technical Report TR-2002-03, Department of Computer Science, University of Chicago, Chicago, IL.
Google Scholar

Additional references

Rivero, C. andValdés, T. (2004). Mean based iterative procedures in linear models with general errors and grouped data.Scandinavian Journal of Statistics, 31(3):469–486.
Article MATH MathSciNet Google Scholar

Additional references

Antoniadis, A. andFan, J. (2001). Regularized wavelet approximations (with discussion).Journal of the American Statistical Association, 96:939–967.
Article MATH MathSciNet Google Scholar
Chamberlain, G. andRothschild, M. (1983). Arbitrage, factor structure, and mean-variance analysis on large asset markets.Econometrica, 51:1281–1304.
Article MATH MathSciNet Google Scholar
Chen, S., Donoho, D. L., andSaunders, M. A. (1998). Automatic decomposition by basis pursuit.SIAM Journal on Scientific Computting, 20:33–61.
Article MathSciNet Google Scholar
Donoho, D. L. andJohnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage.Biometrika, 81:425–455.
Article MATH MathSciNet Google Scholar
Efromovich, S. (1999).Nonparametric Curve Estimation: Methods, Theory and Applications, Springer-Verlag, New York.
MATH Google Scholar
Fama, E. andFrench, K. (1993). Common risk factors in the returns on stocks and bonds.Journal of Financial Economics, 33:3–56.
Article Google Scholar
Fan, J., Chen, Y., Chan, H. M., Tam, P., andRen, Y. (2005a). Removing intensity effects and identifying significant genes for Affymetrix arrays in MIF-suppressed neuroblastoma cells.Proceedings of the National Academy of Sciences of the United states of America, 103:17751–17756.
Article Google Scholar
Fan, J. andJiang, J. (2005). Nonparametric inference for additive models.Journal of the American Statistical Association, 100:890–907.
Article MathSciNet MATH Google Scholar
Fan, J., Peng, H., andHuang, T. (2005b). Semilinear high-dimensional model for normalization of microarray data: a theoretical analysis and partial consistency.Journal of the American Statistical Association, 100:781–813.
Article MathSciNet MATH Google Scholar
Fan, J., Zhang, C. M., andZhang, J. (2001). Generalized likelihood ratio statistics and Wilks phenomenon.The Annals of Statistics, 29: 153–193.
MATH MathSciNet Google Scholar
Ross, S. (1976). The arbitrage theory of capital asset pricing.Journal of Economic Theory, 13:341–360.
Article MathSciNet Google Scholar
Tibshirani, R., Hastie, T., Narasimhan, B., andChu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression.Proceedings of the National Academy of Sciences of the United states of America, 99:6567–6572.
Article Google Scholar

Additional references

Barron, A., Birgé, L., andMassart, P. (1999). Risk bounds for model selection via penalization.Probability Theory and Related Fields, 113(3):301–413.
Article MATH MathSciNet Google Scholar
Belitser, E. andGhosal, S. (2003). Adaptive Bayesian inference on the mean of an infinite-dimensional normal distribution.The Annals of Statistics, 31(2):536–559.
Article MathSciNet Google Scholar
Birgé, L. (2006). Statistical estimation with model selection (Brouwer lecture). Preprint.
Cai, T. T. andLow, M. G. (2004). An adaptation theory for nonparametric confidence intervals.The Annals of Statistics,33(5):1805–1840.
MathSciNet Google Scholar
Cai, T. T. andLow, M. G. (2005). On adaptive estimation of linear functionals.The Annals of Statistics,33(5):2311–2343.
Article MATH MathSciNet Google Scholar
Cohen, A., Dahmen, W., Daubechies, I., andDeVore, R. (2001). Tree approximation and optimal encoding.Applied and Computational Harmonic Analysis. Time-Frequency and Time-Scale Analysis, Wavelets, Numerical Algorithms, and Applications, 11(2):192–226.
MATH MathSciNet Google Scholar
DeVore, R., Kerkyacharian, G., Picard, D., andTemlyakov, V. (2006). Approximation methods for supervised learning.Foundations of Computational Mathematics. The Journal of the Society for the Foundations of Computational Mathematics, 6(1):3–58.
Article MATH MathSciNet Google Scholar
DeVore, R. A. andLorentz, G. G. (1993).Constructive approximation, Vol. 303 ofGrundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin.
MATH Google Scholar
Ghosal, S., Ghosh, J. K., andvan der Vaart, A. W. (2000). Convergence rates of posterior distributions.The Annals of Statistics, 28(2):500–531.
Article MATH MathSciNet Google Scholar
Ghosal, S., Lember, J., andVan Der Vaart, A. (2003). On Bayesian adaptation.Acta Applicandae Mathematicae. An International Survey Journal on Applying Mathematics and Mathematical Applications, 79(1–2):165–175.
MATH MathSciNet Google Scholar
Ghosal, S. andvan der Vaart, A. W. (2006). Convergence rates of posterior distributions for noniid observations.The Annals of Statistics, 34.
Hoffmann, M. andLepski, O. (2002). Random rates in anisotropic regression.The Annals of Statistics, 30(2):325–396.
Article MATH MathSciNet Google Scholar
Huang, T.-M. (2004). Convergence rates for posterior distributions and adaptive estimation.The Annals of Statistics, 32(4):1556–1593.
Article MATH MathSciNet Google Scholar
Juditsky, A. andLambert-Lacroix, S. (2003). Nonparametric confidence set estimation.Mathematical Methods of Statistics, 12(4):410–428 (2004).
MathSciNet Google Scholar
Juditsky, A., Nazin, A., Tsybakov, A., andVayatis, N. (2005). Recursive aggregation of estimators by mirror descent algorithm with averaging.Problems of Information Transmission, 41(4):368–384.
Article MATH MathSciNet Google Scholar
Juditsky, A. andNemirovski, A. (2000). Functional aggregation for nonparametric regression.The Annals of Statistics, 28(3):681–712.
Article MATH MathSciNet Google Scholar
Keles, S., van der Laan, M., andDudoit, S. (2004). Asymptotically optimal model selection method with right censored outcomes.Bernoulli. Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, 10(6):1011–1037.
MATH MathSciNet Google Scholar
Kerkyacharian, G. andPicard, D. (2004). Entropy, universal coding, approximation, and bases properties.Constructive Approximation. An International Journal for Approximations and Expansions, 20(1):1–37.
Article MATH MathSciNet Google Scholar
Lember, J. (2004). On Bayesian adaptation. Preprint.
Li, L., Tchetgen, E., Robins, J., andvan der Vaart, A. (2005). Robust inference with higher order influence functions: Parts I and II. Joint Statistical Meetings, Minneapolis, Minnesota.
Murphy, S. A. andvan der Vaart, A. W. (2000). On profile likelihood.Journal of the American Statistical Association, 95(450):449–485.
Article MATH MathSciNet Google Scholar
Nemirovski, A. (2000). Topics in non-parametric statistics. InLectures on probability theory and statistics (Saint-Flour, 1998), Vol. 1738 ofLecture Notes in Mathematics, pp. 85–277, Springer-Verlag, Berlin.
Google Scholar
Robins, J. M. (1997). Causal inference from complex longitudinal data. In M. Berkane, ed.,Latent variable modeling and applications to causality (Los Angeles, CA, 1994), Vol. 120 ofLecture Notes in Statistics, pp. 69–117. Springer-Verlag, New York.
Google Scholar
Robins, J. M. andvan der Vaart, A. W. (2006). Adaptive nonparametric confidence sets.The Annals of Statistics, 34(1):229–253.
Article MATH MathSciNet Google Scholar
van der Laan, M. J. andRobins, J. M. (2003).Unified methods for censored longitudinal data and causality. Springer Series in Statistics. Springer-Verlag, New York.
MATH Google Scholar
van der Vaart, A. W. (2002). Semiparametric statistics. InLectures on probability theory and statistics (Saint-Flour, 1999), Vol. 1781 ofLecture Notes in Mathematics, pp. 331–457, Springer-Verlag, Berlin.
Google Scholar
Yang, Y. (2000). Mixing strategies for density estimation.The Annals of Statistics, 28(1):75–87.
Article MATH MathSciNet Google Scholar
Yang, Y. (2004). Aggregating regression procedures to improve performance.Bernoulli. Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, 10(1):25–47.
MATH MathSciNet Google Scholar

Additional references

Bickel, P. J. andFreedman, D. A. (1981). Some asymptotic theory for the bootstrap.The Annals of Statistics, 9(6):1196–1217.
MATH MathSciNet Google Scholar
Bickel, P. J. andRitov, Y. (2000). Non- and semiparametric statistics: compared and contrasted.Journal of Statistical Planning and Inference, 91(2):209–228. Prague Workshop on Perspectives in Modern Statistical Inference: Parametrics, Semi-parametrics, Non-parametrics (1998).
Article MATH MathSciNet Google Scholar
Cox, D. D. (1993). An analysis of Bayesian inference for nonparametric regression.The Annals of Statistics, 21(2):903–923.
MATH MathSciNet Google Scholar
Devroye, L. P. andWagner, T. J. (1979). Distribution-free performance bounds for potential function rules.Institute of Electrical and Electronics Engineers. Transactions on Information Theory, 25(5):601–604.
Article MATH MathSciNet Google Scholar
Freedman, D. (1999). On the Bernstein-von Mises theorem with infinite-dimensional parameters.The Annals of Statistics, 27(4):1119–1140.
MATH MathSciNet Google Scholar
Gray, H. L. andSchucany, W. R. (1972).The generalized jackknife statistic, Vol. 1 ofStatistics Textbooks and Monographs, Marcel Dekker Inc., New York.
Google Scholar
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., andStahel, W. A. (1986).Robust statistics. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley & Sons Inc., New York. The approach based on influence functions.
MATH Google Scholar
Hodges, J. L., Jr. (1967). Efficiency in normal samples and tolerance of extreme values for some estimates of location. InProc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), pp. Vol. I: Statistics, pp. 163–186. Univ. California Press, Berkeley, Calif.
Google Scholar
Huber, P. J. (1964). Robust estimation of a location parameter.Ann. Math. Statist., 35:73–101.
MathSciNet MATH Google Scholar
Kleijn, B. andvan der Vaart, A. (2005). The Bernstein-Von-Mises theorem under misspecification. Unpublished.
Koltchinskii, V. (2006). 2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization.The Annals of Statistics, 34(6). To appear.
Li, L., Techtgen, E., Robins, J., andvan der Vaart, A. (2005). Robust inference with higher order influence functions: Parts I and II. Joint Statistical Meetings, Minneapolis, Minnesota.
van der Vaart, A. W. (1998).Asymptotic statistics, Vol. 3 ofCambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge.
Google Scholar
Wahba, G. (1990).Spline models for observational data, Vol. 59 ofCBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Statistics, University of California at Berkeley, USA
Peter J. Bickel & Bin Yu
School of Economics and Management, Tsinghua University, China
Bo Li
Laboratoire de Probabilités et Modèles Aléatoires, Université Paris VI, France
Alexandre B. Tsybakov
Seminar für Statistik, ETH Zürich, Switzerland
Sara A. van de Geer
Department of Statistics and Operational Research, Complutense University of Madrid, Spain
Teófilo Valdés & Carlos Rivero
Department of Operations Research and Financial Engineering, Princeton University, USA
Jianqing Fan
Department of Mathematics, Vrije Universiteit Amsterdam, Netherlands
Aad van der Vaart

Authors

Peter J. Bickel
View author publications
You can also search for this author in PubMed Google Scholar
Bo Li
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre B. Tsybakov
View author publications
You can also search for this author in PubMed Google Scholar
Sara A. van de Geer
View author publications
You can also search for this author in PubMed Google Scholar
Bin Yu
View author publications
You can also search for this author in PubMed Google Scholar
Teófilo Valdés
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Rivero
View author publications
You can also search for this author in PubMed Google Scholar
Jianqing Fan
View author publications
You can also search for this author in PubMed Google Scholar
Aad van der Vaart
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Peter J. Bickel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bickel, P.J., Li, B., Tsybakov, A.B. et al. Regularization in statistics. Test 15, 271–344 (2006). https://doi.org/10.1007/BF02607055

Download citation

Issue Date: September 2006
DOI: https://doi.org/10.1007/BF02607055

Key Words

AMS subject classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regularization in statistics

Abstract

Access this article

Similar content being viewed by others

Robust Methods for High-Dimensional Regression and Covariance Matrix Estimation

Regularization: From Inverse Problems to Large-Scale Machine Learning

Introduction

References

Additional references

Additional references

Additional references

Additional references

Additional references

Additional references

Additional references

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Key Words

AMS subject classification

Navigation

Regularization in statistics

Abstract

Access this article

Similar content being viewed by others

Robust Methods for High-Dimensional Regression and Covariance Matrix Estimation

Regularization: From Inverse Problems to Large-Scale Machine Learning

Introduction

References

Additional references

Additional references

Additional references

Additional references

Additional references

Additional references

Additional references

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Key Words

AMS subject classification

Search

Navigation