Skip to main content
Log in

Regularization in statistics

  • Published:
Test Aims and scope Submit manuscript

Abstract

This paper is a selective review of the regularization methods scattered in statistics literature. We introduce a general conceptual approach to regularization and fit most existing methods into it. We have tried to focus on the importance of regularization when dealing with today's high-dimensional objects: data and models. A wide range of examples are discussed, including nonparametric regression, boosting, covariance matrix estimation, principal component estimation, subsampling.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike, H. (1970). Statistical predictor identification.Annals of the Institute of Statistical Mathematics, 22:203–217.

    Article  MATH  MathSciNet  Google Scholar 

  • Bair, E., Hastie, T. J., Paul, D., andTibshirani, R. (2006). Prediction by supervised principal components.Journal of the American Statistical Association, 101(473):119–137.

    Article  MathSciNet  MATH  Google Scholar 

  • Bickel, P. J., Götze, F., andvan Zwet, W. R. (1997). Resampling fewer thann observations: gains, losses, and remedies for losses.Statistica Sinica, 7(1):1–31. Empirical Bayes, sequential analysis and related topics in statistics and probability (New Brunswick, NJ, 1995).

    MATH  MathSciNet  Google Scholar 

  • Bickel, P. J., Klaassen, C. A. J., Ritov, Y., andWellner, J. A. (1998).Efficient and adaptive estimation for semiparametric models.Reprint of the 1993 original. Springer-Verlag, New York.

    MATH  Google Scholar 

  • Bickel, P. J. andLevina, E. (2004). Some theory of Fisher's linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations.Bernoulli, 10(6):989–1010.

    Article  MATH  MathSciNet  Google Scholar 

  • Bickel, P. J. andLevina, E. (2006). Regularized estimation of large covariance matrices. Technical Report 716, Department of Statistics, University of California, Berkeley, CA.

    Google Scholar 

  • Bickel, P. J., Ritov, Y., andZakai, A. (2006). Some theory for generalized boosting algorithms.Journal of Machine Learning Research. To appear.

  • Bickel, P. J. andSakov, A. (2005). On the choice ofm in them out ofn bootstrap and its application to confidence bounds for extreme percentiles. Unpublished.

  • Birgé, L. andMassart, P. (1997). From model selection to adaptive estimation. In D. Pollard, E. Torgessen, and G. Yang, eds.,A Festschrift for Lucien Le Cam: Research papers in Probability and Statistics, pp. 55–87 Springer-Verlag, New York.

    Google Scholar 

  • Birgé, L. andMassart, P. (2001). Gaussian model selection.Journal of the European Mathematical Society, 3(3):203–268.

    Article  MATH  MathSciNet  Google Scholar 

  • Böttcher, A. andSilbermann, B. (1999).Introduction to large truncated Toeplitz matrices, Universitext. Springer-Verlag, New York.

    MATH  Google Scholar 

  • Breiman, L. (1996). Heuristics of instability and stabilization in model selection.The Annals of Statistics, 24(6):2350–2383.

    Article  MATH  MathSciNet  Google Scholar 

  • Breiman, L., Stone, C. J., andKooperberg, C. (1990). Robust confidence bounds for extreme upper quantiles.Journal of Statistical Computation and Simulation, 37(3–4):127–149.

    MATH  MathSciNet  Google Scholar 

  • Bühlmann, P. (2006). Boosting for high-dimensional linear models.The Annals of Statistics, 34(2):559–583.

    Article  MATH  MathSciNet  Google Scholar 

  • Bühlmann, P. andYu, B. (2006). Sparse boosting.Journal of Machine Learning Research, 7:1001–1024.

    Google Scholar 

  • Bunea, F., Wegkamp, M. H., andAuguste, A. (2006). Consistent variable selection in high dimensional regression via multiple testing.Journal ofStatistical Planning and Inference, 136(12):4349–4364.

    Article  MATH  MathSciNet  Google Scholar 

  • Chen, H. (1988). Convergence rates for parametric components in a partly linear model.The Annals of Statistics, 16(1):136–146.

    MATH  MathSciNet  Google Scholar 

  • Craven, P., andWahba, G. (1979). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation.Numerische Mathematik 31(4): 377–403.

    Article  MATH  MathSciNet  Google Scholar 

  • Daniels, M. J., andPourahmadi, M. (2002). Bayesian analysis of covariance matrices and dynamic models for longitudinal data.Biometrika, 89(3):553–566.

    Article  MATH  MathSciNet  Google Scholar 

  • Datta, S., andMcCormick, W. P. (1995). Bootstrap inference for a firstorder autoregression with positive innovations.Journal of the American Statistical Association, 90(432):1289–1300

    Article  MATH  MathSciNet  Google Scholar 

  • Devroye, L., Györfi, L., andLugosi, G. (1996),A probabilistic theory of pattern recognition, Vol. 31 ofApplications of Mathematics (New York). Springer-Verlag, New York.

    Google Scholar 

  • Donoho, D. L. (2000). High dimensional data analysis: The curses and blessings of dimensionality. InMath Challenges of 21st Centuary (2000). American Mathematical Society. Plenary speaker. Available in: http: //www-stat.stanford.edu/donoho/Lectures/AMS2000/

  • Donoho, D. L., andJohnstone, I. M. (1998). Minimax estimation via wavelet shrinkage.The Annals of Statistics, 26(3):879–921

    Article  MATH  MathSciNet  Google Scholar 

  • Draper, N. R., andSmith, H. (1998).Applied regression analysis. Wiley Series in Probability and Statistics: Texts and References Section, John Wiley & Sons, New York, 3rd ed.

    MATH  Google Scholar 

  • Dudoit, S., Fridlyand, J., andSpeed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data.Journal of the American Statistical Association, 97(457):77–87.

    Article  MATH  MathSciNet  Google Scholar 

  • Dudoit, S., andVan der Laan, M. J. (2005). Asymptotics of crossvalidaed risk estimation in estimator selection and performance assessment.Statistical Methodology, 2(2):131–154.

    Article  MathSciNet  Google Scholar 

  • Efron, B. (1979). Bootstrap methods: another look at the jackknife.The annals of Statistics, 7(1):1–26.

    MATH  MathSciNet  Google Scholar 

  • Efron, B. (2004). The estimation of prediction error: covariance penalties and cross-validation (with discussions).Journal of the American Statistical Association, 99(467):619–642.

    Article  MathSciNet  MATH  Google Scholar 

  • Efron, B., Hastie, T. J., Johnstone, I., andTibshirani, R. (2004). Least angle regression (with discussions).The Annals of Statistics, 32(2):407–499.

    Article  MATH  MathSciNet  Google Scholar 

  • Fan J., andGijbels, I. (1996).Local polynomial modelling and its applications, Vol. 66 ofMonographs on Statistics and Applied Probability. Chapman & Hall/CRC, London

    MATH  Google Scholar 

  • Fan, J., andLi, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties.Journal of the American Statistical Assocition, 96(456):1348–1360.

    Article  MATH  MathSciNet  Google Scholar 

  • Fan, J. andLi, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In: M. Sanz-Sole, J. Soria, J. L. Varona, and J. Verdera, eds.Proceedings of the International Congress of Mathematicians, Madrid 2006, Vol. III, pp 595–622, European Mathematical Society Publishing House.

  • Fan, J., andPeng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters.The Annals of Statistics, 32(3):928–961.

    Article  MATH  MathSciNet  Google Scholar 

  • Furrer, R. andBengtsson, T. (2006). Estimation of high-dimensional prior and posteriori covariance matrices in Kalman filter variants.Journal of Multivariate Analysis. To appear.

  • Götze, F. (1993). Asymptotic approximation and the bootstrap.I.M.S. Bulletin, p. 305.

  • Götze, F., andRaĉkauskas, A. (2001). Adaptive choice of bootstrap sample sizes. InState of the art in probability and statistics (Leiden, 1999), Vol 36 ofIMS Lecture Notes Monograph Series, pp. 286–309. Institute of Mathematical Statitics, Beachwood, OH

    Google Scholar 

  • Greenshtein, E. (2006). Best subset selection, persistence in high-dimensional statistical learning and optimization under ℓ1-constraintThe Annals of Statistics 34(5), To appear.

  • Greenshtein, E., andRitov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization.Bernoulli, 10(6):971–988.

    MATH  MathSciNet  Google Scholar 

  • Györfi, L., Kohler, M., Krzyzak, A., andWalk, H. (2002).A distribution-free theory of nonparametric regression, Springer Series in Statistics. Springer-Verlag, New York.

    MATH  Google Scholar 

  • Hall, P. (1992).The bootstrap and Edgeworth expansion. Springer Series in Statistics, Springer-Verlag, New York.

    Google Scholar 

  • Hall, P., Horowitz, J. L., andJing, B.-Y. (1995). On blocking rules for the bootstrap with dependent data.Biometrika, 82(3):561–574.

    Article  MATH  MathSciNet  Google Scholar 

  • Hastie, T. J., Tibshirani, R., andFriedman, J. H. (2001).The elements of statistical learning. Springer Series in Statistics. Springer-Verlag, New York. Data mining, inference, and prediction.

    MATH  Google Scholar 

  • Hoerl, A. E. andKennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems.Technometrics, 12(1):55–67.

    Article  MATH  MathSciNet  Google Scholar 

  • Huang, J., Liu, N., Pourahmadi, M., andLiu, L. (2006). Covariance matrix selection and estimation via penalise normal likelihood.Biometrika, 93(1):85–98.

    Article  MathSciNet  MATH  Google Scholar 

  • Hunter, D. R. andLi, R. (2005). Variable selection using MM algorithms.The Annals of Statistics, 33(4):1617–1642.

    Article  MATH  MathSciNet  Google Scholar 

  • James, W. andStein, C. (1961). Estimation with quadratic loss. InProceedings of the 4th Berkeley Sympos. Math. Statist. and Probability, Vol. I, pp. 361–379. Univ. California Press, Berkeley, Calif.

    Google Scholar 

  • Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis.The Annals of Statistics,29(2):295–327.

    Article  MATH  MathSciNet  Google Scholar 

  • Johnstone, I. M. andLu, A. Y. (2006). Sparse principle component analysis.Journal of the American Statistical Association. To appear.

  • Johnstone, I. M. andSilverman, B. W. (2005). Empirical Bayes selection of wavelet thresholds.The Annals of Statistics, 33(4):1700–1752.

    Article  MATH  MathSciNet  Google Scholar 

  • Kass, R. E., andRaftery, A. E. (1995). Bayes factors.Journal of the American Statistical Association, 90(430):773–795.

    Article  MATH  Google Scholar 

  • Kass, R. E. andWasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion.Journal of the American Statistical Association, 90(431):928–934.

    Article  MATH  MathSciNet  Google Scholar 

  • Kosorok, M. andMa, S. (2006). Marginal asymptotics for the “large p, small n” paradigm: with applications to microarray data. Unpublished.

  • Künsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations.The Annals of Statistics, 17(3):1217–1241.

    MATH  MathSciNet  Google Scholar 

  • Ledoit, O., andWolf, M. (2004). A well-conditioned estimator for large-dimensional coveriance matrices.Journal of Multivariate Analysis, 88(2):365–411.

    Article  MATH  MathSciNet  Google Scholar 

  • Li, K.-C. (1985). From Stein's unbiased risk estimates to the method of generalized cross validation.The Annals of Statistics, 13(4):1352–1377.

    MATH  MathSciNet  Google Scholar 

  • Li, K.-C. (1986). Asymptotic optimality ofC L and generalized cross-validation in ridge regression with application to spline smoothing.The Annals of Statistics, 14(3):1101–1112.

    MATH  MathSciNet  Google Scholar 

  • Li, K.-C. (1987). Asymptotic optimality forC p ,C L , cross-validation and generalized cross-validation: discrete index set.The Annals of Statistics, 15(3):958–975.

    MATH  MathSciNet  Google Scholar 

  • Lugosi, G. andNobel, A. B. (1999). Adaptive model selection using empirical complexities.The Annals of Statistics, 27(6):1830–1864.

    Article  MATH  MathSciNet  Google Scholar 

  • Lugosi, G., andVayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods.The Annals of Statistics, 32(1):300–55.

    Google Scholar 

  • Mallows, C. L. (1973). Some comments onc p .Technometrics, 15(4):661–675.

    Article  MATH  Google Scholar 

  • Mammen, E. (1992).When Does Bootstrap Work?, Springer-Verlag, New York.

    Google Scholar 

  • Mammen, E. andTsybakov, A. B. (1999). Smooth discrimination analysis.The Annals of Statistics, 27(6):1808–1829.

    Article  MATH  MathSciNet  Google Scholar 

  • Meinshausen, N. (2005). Lasso with relaxation. Unpublished.

  • Nadaraya, E. A. (1964). On estimating regression.Theory of Probability and Its Applications, 10:186–190.

    Article  Google Scholar 

  • Parzen, E. (1962). On estimation of a probability density function and mode.The Annals of Mathematical Statistics, 33:1065–1076.

    MathSciNet  MATH  Google Scholar 

  • Paul, D. (2005). Asymptotics of the leading sample eigenvalues for a spiked covariance model. Unpublished.

  • Politis, D. N. andRomano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions.The Annals of Statistics, 22(4):2031–2050.

    MATH  MathSciNet  Google Scholar 

  • Politis, D. N. Romano, J. P., andWolf, M. (1999).Subsampling. Springer Series in Statistics. Springer-Verlag, New York.

    MATH  Google Scholar 

  • Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation.Biometrika, 86(3):677–690.

    Article  MATH  MathSciNet  Google Scholar 

  • Pourahmadi, M. (2000). Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix.Biometrika, 87(2):425–435.

    Article  MATH  MathSciNet  Google Scholar 

  • Rissanen, J. (1984). Universal coding, information, prediction, and estimationInstitute of Electrical and Electronics Engineers. Transactions on Information Theory, 30(4):629–636.

    Article  MATH  MathSciNet  Google Scholar 

  • Robert, C. P. andCasella, G. (2004).Monte Carlo statistical methods. Springer Texts in Statistics. Springer-Verlag, New York, 2nd ed.

    MATH  Google Scholar 

  • Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function.The Annals of Mathematical Statistics, 27:832–837.

    MathSciNet  MATH  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464.

    MATH  MathSciNet  Google Scholar 

  • Shao, J. (1997). An asymptotic theory for linear model selection (with discussion).Statistica Sinica, 7(2):221–264.

    MATH  MathSciNet  Google Scholar 

  • Smith, M. andKohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data.Journal of the American Statistical Association, 97(460):1141–1153.

    Article  MATH  MathSciNet  Google Scholar 

  • Stone, C. J., Hansen, M. H., Kooperberg, C., andTruong, Y. K. (1997). Polynomial splines and their tensor products in extended linear modeling (with discussions).The Annals of Statistics, 25(4):1371–1470.

    Article  MATH  MathSciNet  Google Scholar 

  • Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussions).Journal of the Royal Statistical Society. Series B, 36:111–147.

    MATH  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society. Series B, 58(1):267–288.

    MATH  MathSciNet  Google Scholar 

  • Tikhonov, A. N. (1943). On the stability of inverse problems.C. R. (Doklady) Acad. Sci. URSS (N.S.), 39:176–179.

    MathSciNet  Google Scholar 

  • Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning.The Annals of Statistics, 32(1):135–166.

    Article  MATH  MathSciNet  Google Scholar 

  • Vapnik, V. N. (1998).Statistical learning theory Adaptive and Learning Systems for Signal Processing, Communications, and Control. John Wiley & Sons, New York. A Wiley-Interscience Publication.

    MATH  Google Scholar 

  • Wachter, K. W. (1978). The strong limits of random matrix spectra for sample matrices of independent elements.The Annals of Probability, 6(1):1–18.

    MATH  MathSciNet  Google Scholar 

  • Wang, Y. (2004). Model selection. InHandbook of computational statistics, pp. 437–466. Springer-Verlag. Berlin.

    Google Scholar 

  • Watson, G. S. (1964). Smooth regression analysis.Sankhyā. Series A, 26:359–372.

    MATH  Google Scholar 

  • Wigner, E. P. (1955). Characteristic vectors of bordered matrices with infinite dimensions.Annals of Mathematics. Second Series, 62:548–564.

    MathSciNet  Google Scholar 

  • Wu, W. B. andPourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data.Biometrika, 90(4):831–844.

    Article  MathSciNet  Google Scholar 

  • Zhang, H. H., Wahba, G., Lin, Y., Voelker, M., Ferris, M., Klein, R., andKlein, B. (2004). Variable selection and model building via likelihood basis pursuit.Journal of the American Statistical Association, 99(467):659–672.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang, T. andYu, B. (2005). Boosting with early stopping: convergence and consistency.The Annals of Statistics, 33(4):1538–1579.

    Article  MATH  MathSciNet  Google Scholar 

  • Zou, H. andHastie, T. J. (2005). Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society. Series B, 67(2):301–320.

    Article  MATH  MathSciNet  Google Scholar 

Additional references

  • Barron, A., Cohen, A., Dahmen, W., andDeVore, R. (2005). Approximation and learning by greedy algorithms. Manuscript.

  • Bunea, F., Tsybakov, A. B., andWegkamp, M. H. (2005). Aggregation for gaussian regression.The Annals of Statistics. Tentatively accepted.

  • Bunea, F., Tsybakov, A. B., andWegkamp, M. H. (2006). Aggregation and sparsity via ℓ1 penalized least squares. In H. U. Simon and G. Lugosi, eds.,Proceedings of 19th Annual Conference on Learning Theory (COLT 2006), Vol. 4005 ofLecture Notes in Artificial Intelligence, pp. 379–391. Springer-Verlag, Berlin-Heidelberg.

    Google Scholar 

  • Juditsky, A., Nazin, A., Tsybakov, A., andVayatis, N. (2005a). Recursive aggregation of estimators by mirror descent algorithm with averaging.Problems of Information Transmission, 41(4):368–384.

    Article  MATH  MathSciNet  Google Scholar 

  • Juditsky, A., Rigollet, P., andTsybakov, A. B. (2005b). Learning by mirror averaging. Preprint, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6—Paris 7. https://hal.ccsd.cnrs.fr/ccsd-00014097.

  • Klemelä, J. (2006). Density estimation with stagewise optimization of the empirical risk. Manuscript.

  • Mannor, S., Meir, R. andZhang, T. (2003). Greedy algorithms for classification—consistency, convergence rates, and adaptivity.Journal of Machine Learning Research, 4:713–742.

    Article  MathSciNet  Google Scholar 

  • Mason, L., Baxter, J., Bartlett, P. L., andFrean, M. (2000). Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, eds.,Advances in Large Margin Classifiers, pp. 221–247. MIT Press, Cambridge, MA.

    Google Scholar 

  • Tsybakov, A. B. (2003). Optimal rates of aggregation. In B. Schölkopf and M. Warmuth, eds.,Proceedings of 16th Annual Conference on Learning Theory (COLT 2003) and 7th Annual Workshop on Kernel Machines, Vol. 2777 ofLecture Notes in Artificial Intelligence, pp. 303–313. Springer-Verlag, Berlin-Heidelberg.

    Google Scholar 

Additional references

  • Boucheron, S., Bousquet, O., andLugosi, G. (2005). Theory of classification: a survey of some recent advances.ESAIM. Probability and Statistics, 9:323–375 (electronic).

    MATH  MathSciNet  Google Scholar 

  • Koltchinskii, V. (2006). 2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization.The Annals of Statistics, 34(6). To appear.

Additional references

  • Bousquet, O. andElisseeff, A. (2002). Stability and generalization.Journal of Machine Learning Research, 2(3):499–526.

    Article  MATH  MathSciNet  Google Scholar 

  • Kutin, S., andNiyogi, P. (2002). Almost-everywhere algorithmic stability and genearalization error. Technical Report TR-2002-03, Department of Computer Science, University of Chicago, Chicago, IL.

    Google Scholar 

Additional references

  • Rivero, C. andValdés, T. (2004). Mean based iterative procedures in linear models with general errors and grouped data.Scandinavian Journal of Statistics, 31(3):469–486.

    Article  MATH  MathSciNet  Google Scholar 

Additional references

  • Antoniadis, A. andFan, J. (2001). Regularized wavelet approximations (with discussion).Journal of the American Statistical Association, 96:939–967.

    Article  MATH  MathSciNet  Google Scholar 

  • Chamberlain, G. andRothschild, M. (1983). Arbitrage, factor structure, and mean-variance analysis on large asset markets.Econometrica, 51:1281–1304.

    Article  MATH  MathSciNet  Google Scholar 

  • Chen, S., Donoho, D. L., andSaunders, M. A. (1998). Automatic decomposition by basis pursuit.SIAM Journal on Scientific Computting, 20:33–61.

    Article  MathSciNet  Google Scholar 

  • Donoho, D. L. andJohnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage.Biometrika, 81:425–455.

    Article  MATH  MathSciNet  Google Scholar 

  • Efromovich, S. (1999).Nonparametric Curve Estimation: Methods, Theory and Applications, Springer-Verlag, New York.

    MATH  Google Scholar 

  • Fama, E. andFrench, K. (1993). Common risk factors in the returns on stocks and bonds.Journal of Financial Economics, 33:3–56.

    Article  Google Scholar 

  • Fan, J., Chen, Y., Chan, H. M., Tam, P., andRen, Y. (2005a). Removing intensity effects and identifying significant genes for Affymetrix arrays in MIF-suppressed neuroblastoma cells.Proceedings of the National Academy of Sciences of the United states of America, 103:17751–17756.

    Article  Google Scholar 

  • Fan, J. andJiang, J. (2005). Nonparametric inference for additive models.Journal of the American Statistical Association, 100:890–907.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Peng, H., andHuang, T. (2005b). Semilinear high-dimensional model for normalization of microarray data: a theoretical analysis and partial consistency.Journal of the American Statistical Association, 100:781–813.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., Zhang, C. M., andZhang, J. (2001). Generalized likelihood ratio statistics and Wilks phenomenon.The Annals of Statistics, 29: 153–193.

    MATH  MathSciNet  Google Scholar 

  • Ross, S. (1976). The arbitrage theory of capital asset pricing.Journal of Economic Theory, 13:341–360.

    Article  MathSciNet  Google Scholar 

  • Tibshirani, R., Hastie, T., Narasimhan, B., andChu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression.Proceedings of the National Academy of Sciences of the United states of America, 99:6567–6572.

    Article  Google Scholar 

Additional references

  • Barron, A., Birgé, L., andMassart, P. (1999). Risk bounds for model selection via penalization.Probability Theory and Related Fields, 113(3):301–413.

    Article  MATH  MathSciNet  Google Scholar 

  • Belitser, E. andGhosal, S. (2003). Adaptive Bayesian inference on the mean of an infinite-dimensional normal distribution.The Annals of Statistics, 31(2):536–559.

    Article  MathSciNet  Google Scholar 

  • Birgé, L. (2006). Statistical estimation with model selection (Brouwer lecture). Preprint.

  • Cai, T. T. andLow, M. G. (2004). An adaptation theory for nonparametric confidence intervals.The Annals of Statistics,33(5):1805–1840.

    MathSciNet  Google Scholar 

  • Cai, T. T. andLow, M. G. (2005). On adaptive estimation of linear functionals.The Annals of Statistics,33(5):2311–2343.

    Article  MATH  MathSciNet  Google Scholar 

  • Cohen, A., Dahmen, W., Daubechies, I., andDeVore, R. (2001). Tree approximation and optimal encoding.Applied and Computational Harmonic Analysis. Time-Frequency and Time-Scale Analysis, Wavelets, Numerical Algorithms, and Applications, 11(2):192–226.

    MATH  MathSciNet  Google Scholar 

  • DeVore, R., Kerkyacharian, G., Picard, D., andTemlyakov, V. (2006). Approximation methods for supervised learning.Foundations of Computational Mathematics. The Journal of the Society for the Foundations of Computational Mathematics, 6(1):3–58.

    Article  MATH  MathSciNet  Google Scholar 

  • DeVore, R. A. andLorentz, G. G. (1993).Constructive approximation, Vol. 303 ofGrundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin.

    MATH  Google Scholar 

  • Ghosal, S., Ghosh, J. K., andvan der Vaart, A. W. (2000). Convergence rates of posterior distributions.The Annals of Statistics, 28(2):500–531.

    Article  MATH  MathSciNet  Google Scholar 

  • Ghosal, S., Lember, J., andVan Der Vaart, A. (2003). On Bayesian adaptation.Acta Applicandae Mathematicae. An International Survey Journal on Applying Mathematics and Mathematical Applications, 79(1–2):165–175.

    MATH  MathSciNet  Google Scholar 

  • Ghosal, S. andvan der Vaart, A. W. (2006). Convergence rates of posterior distributions for noniid observations.The Annals of Statistics, 34.

  • Hoffmann, M. andLepski, O. (2002). Random rates in anisotropic regression.The Annals of Statistics, 30(2):325–396.

    Article  MATH  MathSciNet  Google Scholar 

  • Huang, T.-M. (2004). Convergence rates for posterior distributions and adaptive estimation.The Annals of Statistics, 32(4):1556–1593.

    Article  MATH  MathSciNet  Google Scholar 

  • Juditsky, A. andLambert-Lacroix, S. (2003). Nonparametric confidence set estimation.Mathematical Methods of Statistics, 12(4):410–428 (2004).

    MathSciNet  Google Scholar 

  • Juditsky, A., Nazin, A., Tsybakov, A., andVayatis, N. (2005). Recursive aggregation of estimators by mirror descent algorithm with averaging.Problems of Information Transmission, 41(4):368–384.

    Article  MATH  MathSciNet  Google Scholar 

  • Juditsky, A. andNemirovski, A. (2000). Functional aggregation for nonparametric regression.The Annals of Statistics, 28(3):681–712.

    Article  MATH  MathSciNet  Google Scholar 

  • Keles, S., van der Laan, M., andDudoit, S. (2004). Asymptotically optimal model selection method with right censored outcomes.Bernoulli. Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, 10(6):1011–1037.

    MATH  MathSciNet  Google Scholar 

  • Kerkyacharian, G. andPicard, D. (2004). Entropy, universal coding, approximation, and bases properties.Constructive Approximation. An International Journal for Approximations and Expansions, 20(1):1–37.

    Article  MATH  MathSciNet  Google Scholar 

  • Lember, J. (2004). On Bayesian adaptation. Preprint.

  • Li, L., Tchetgen, E., Robins, J., andvan der Vaart, A. (2005). Robust inference with higher order influence functions: Parts I and II. Joint Statistical Meetings, Minneapolis, Minnesota.

  • Murphy, S. A. andvan der Vaart, A. W. (2000). On profile likelihood.Journal of the American Statistical Association, 95(450):449–485.

    Article  MATH  MathSciNet  Google Scholar 

  • Nemirovski, A. (2000). Topics in non-parametric statistics. InLectures on probability theory and statistics (Saint-Flour, 1998), Vol. 1738 ofLecture Notes in Mathematics, pp. 85–277, Springer-Verlag, Berlin.

    Google Scholar 

  • Robins, J. M. (1997). Causal inference from complex longitudinal data. In M. Berkane, ed.,Latent variable modeling and applications to causality (Los Angeles, CA, 1994), Vol. 120 ofLecture Notes in Statistics, pp. 69–117. Springer-Verlag, New York.

    Google Scholar 

  • Robins, J. M. andvan der Vaart, A. W. (2006). Adaptive nonparametric confidence sets.The Annals of Statistics, 34(1):229–253.

    Article  MATH  MathSciNet  Google Scholar 

  • van der Laan, M. J. andRobins, J. M. (2003).Unified methods for censored longitudinal data and causality. Springer Series in Statistics. Springer-Verlag, New York.

    MATH  Google Scholar 

  • van der Vaart, A. W. (2002). Semiparametric statistics. InLectures on probability theory and statistics (Saint-Flour, 1999), Vol. 1781 ofLecture Notes in Mathematics, pp. 331–457, Springer-Verlag, Berlin.

    Google Scholar 

  • Yang, Y. (2000). Mixing strategies for density estimation.The Annals of Statistics, 28(1):75–87.

    Article  MATH  MathSciNet  Google Scholar 

  • Yang, Y. (2004). Aggregating regression procedures to improve performance.Bernoulli. Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, 10(1):25–47.

    MATH  MathSciNet  Google Scholar 

Additional references

  • Bickel, P. J. andFreedman, D. A. (1981). Some asymptotic theory for the bootstrap.The Annals of Statistics, 9(6):1196–1217.

    MATH  MathSciNet  Google Scholar 

  • Bickel, P. J. andRitov, Y. (2000). Non- and semiparametric statistics: compared and contrasted.Journal of Statistical Planning and Inference, 91(2):209–228. Prague Workshop on Perspectives in Modern Statistical Inference: Parametrics, Semi-parametrics, Non-parametrics (1998).

    Article  MATH  MathSciNet  Google Scholar 

  • Cox, D. D. (1993). An analysis of Bayesian inference for nonparametric regression.The Annals of Statistics, 21(2):903–923.

    MATH  MathSciNet  Google Scholar 

  • Devroye, L. P. andWagner, T. J. (1979). Distribution-free performance bounds for potential function rules.Institute of Electrical and Electronics Engineers. Transactions on Information Theory, 25(5):601–604.

    Article  MATH  MathSciNet  Google Scholar 

  • Freedman, D. (1999). On the Bernstein-von Mises theorem with infinite-dimensional parameters.The Annals of Statistics, 27(4):1119–1140.

    MATH  MathSciNet  Google Scholar 

  • Gray, H. L. andSchucany, W. R. (1972).The generalized jackknife statistic, Vol. 1 ofStatistics Textbooks and Monographs, Marcel Dekker Inc., New York.

    Google Scholar 

  • Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., andStahel, W. A. (1986).Robust statistics. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley & Sons Inc., New York. The approach based on influence functions.

    MATH  Google Scholar 

  • Hodges, J. L., Jr. (1967). Efficiency in normal samples and tolerance of extreme values for some estimates of location. InProc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), pp. Vol. I: Statistics, pp. 163–186. Univ. California Press, Berkeley, Calif.

    Google Scholar 

  • Huber, P. J. (1964). Robust estimation of a location parameter.Ann. Math. Statist., 35:73–101.

    MathSciNet  MATH  Google Scholar 

  • Kleijn, B. andvan der Vaart, A. (2005). The Bernstein-Von-Mises theorem under misspecification. Unpublished.

  • Koltchinskii, V. (2006). 2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization.The Annals of Statistics, 34(6). To appear.

  • Li, L., Techtgen, E., Robins, J., andvan der Vaart, A. (2005). Robust inference with higher order influence functions: Parts I and II. Joint Statistical Meetings, Minneapolis, Minnesota.

  • van der Vaart, A. W. (1998).Asymptotic statistics, Vol. 3 ofCambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge.

    Google Scholar 

  • Wahba, G. (1990).Spline models for observational data, Vol. 59 ofCBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter J. Bickel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bickel, P.J., Li, B., Tsybakov, A.B. et al. Regularization in statistics. Test 15, 271–344 (2006). https://doi.org/10.1007/BF02607055

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02607055

Key Words

AMS subject classification

Navigation