Abstract
This paper is a selective review of the regularization methods scattered in statistics literature. We introduce a general conceptual approach to regularization and fit most existing methods into it. We have tried to focus on the importance of regularization when dealing with today's high-dimensional objects: data and models. A wide range of examples are discussed, including nonparametric regression, boosting, covariance matrix estimation, principal component estimation, subsampling.
Similar content being viewed by others
References
Akaike, H. (1970). Statistical predictor identification.Annals of the Institute of Statistical Mathematics, 22:203–217.
Bair, E., Hastie, T. J., Paul, D., andTibshirani, R. (2006). Prediction by supervised principal components.Journal of the American Statistical Association, 101(473):119–137.
Bickel, P. J., Götze, F., andvan Zwet, W. R. (1997). Resampling fewer thann observations: gains, losses, and remedies for losses.Statistica Sinica, 7(1):1–31. Empirical Bayes, sequential analysis and related topics in statistics and probability (New Brunswick, NJ, 1995).
Bickel, P. J., Klaassen, C. A. J., Ritov, Y., andWellner, J. A. (1998).Efficient and adaptive estimation for semiparametric models.Reprint of the 1993 original. Springer-Verlag, New York.
Bickel, P. J. andLevina, E. (2004). Some theory of Fisher's linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations.Bernoulli, 10(6):989–1010.
Bickel, P. J. andLevina, E. (2006). Regularized estimation of large covariance matrices. Technical Report 716, Department of Statistics, University of California, Berkeley, CA.
Bickel, P. J., Ritov, Y., andZakai, A. (2006). Some theory for generalized boosting algorithms.Journal of Machine Learning Research. To appear.
Bickel, P. J. andSakov, A. (2005). On the choice ofm in them out ofn bootstrap and its application to confidence bounds for extreme percentiles. Unpublished.
Birgé, L. andMassart, P. (1997). From model selection to adaptive estimation. In D. Pollard, E. Torgessen, and G. Yang, eds.,A Festschrift for Lucien Le Cam: Research papers in Probability and Statistics, pp. 55–87 Springer-Verlag, New York.
Birgé, L. andMassart, P. (2001). Gaussian model selection.Journal of the European Mathematical Society, 3(3):203–268.
Böttcher, A. andSilbermann, B. (1999).Introduction to large truncated Toeplitz matrices, Universitext. Springer-Verlag, New York.
Breiman, L. (1996). Heuristics of instability and stabilization in model selection.The Annals of Statistics, 24(6):2350–2383.
Breiman, L., Stone, C. J., andKooperberg, C. (1990). Robust confidence bounds for extreme upper quantiles.Journal of Statistical Computation and Simulation, 37(3–4):127–149.
Bühlmann, P. (2006). Boosting for high-dimensional linear models.The Annals of Statistics, 34(2):559–583.
Bühlmann, P. andYu, B. (2006). Sparse boosting.Journal of Machine Learning Research, 7:1001–1024.
Bunea, F., Wegkamp, M. H., andAuguste, A. (2006). Consistent variable selection in high dimensional regression via multiple testing.Journal ofStatistical Planning and Inference, 136(12):4349–4364.
Chen, H. (1988). Convergence rates for parametric components in a partly linear model.The Annals of Statistics, 16(1):136–146.
Craven, P., andWahba, G. (1979). Smoothing noisy data with spline functions. Estimating the correct degree of smoothing by the method of generalized cross-validation.Numerische Mathematik 31(4): 377–403.
Daniels, M. J., andPourahmadi, M. (2002). Bayesian analysis of covariance matrices and dynamic models for longitudinal data.Biometrika, 89(3):553–566.
Datta, S., andMcCormick, W. P. (1995). Bootstrap inference for a firstorder autoregression with positive innovations.Journal of the American Statistical Association, 90(432):1289–1300
Devroye, L., Györfi, L., andLugosi, G. (1996),A probabilistic theory of pattern recognition, Vol. 31 ofApplications of Mathematics (New York). Springer-Verlag, New York.
Donoho, D. L. (2000). High dimensional data analysis: The curses and blessings of dimensionality. InMath Challenges of 21st Centuary (2000). American Mathematical Society. Plenary speaker. Available in: http: //www-stat.stanford.edu/donoho/Lectures/AMS2000/
Donoho, D. L., andJohnstone, I. M. (1998). Minimax estimation via wavelet shrinkage.The Annals of Statistics, 26(3):879–921
Draper, N. R., andSmith, H. (1998).Applied regression analysis. Wiley Series in Probability and Statistics: Texts and References Section, John Wiley & Sons, New York, 3rd ed.
Dudoit, S., Fridlyand, J., andSpeed, T. P. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data.Journal of the American Statistical Association, 97(457):77–87.
Dudoit, S., andVan der Laan, M. J. (2005). Asymptotics of crossvalidaed risk estimation in estimator selection and performance assessment.Statistical Methodology, 2(2):131–154.
Efron, B. (1979). Bootstrap methods: another look at the jackknife.The annals of Statistics, 7(1):1–26.
Efron, B. (2004). The estimation of prediction error: covariance penalties and cross-validation (with discussions).Journal of the American Statistical Association, 99(467):619–642.
Efron, B., Hastie, T. J., Johnstone, I., andTibshirani, R. (2004). Least angle regression (with discussions).The Annals of Statistics, 32(2):407–499.
Fan J., andGijbels, I. (1996).Local polynomial modelling and its applications, Vol. 66 ofMonographs on Statistics and Applied Probability. Chapman & Hall/CRC, London
Fan, J., andLi, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties.Journal of the American Statistical Assocition, 96(456):1348–1360.
Fan, J. andLi, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In: M. Sanz-Sole, J. Soria, J. L. Varona, and J. Verdera, eds.Proceedings of the International Congress of Mathematicians, Madrid 2006, Vol. III, pp 595–622, European Mathematical Society Publishing House.
Fan, J., andPeng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters.The Annals of Statistics, 32(3):928–961.
Furrer, R. andBengtsson, T. (2006). Estimation of high-dimensional prior and posteriori covariance matrices in Kalman filter variants.Journal of Multivariate Analysis. To appear.
Götze, F. (1993). Asymptotic approximation and the bootstrap.I.M.S. Bulletin, p. 305.
Götze, F., andRaĉkauskas, A. (2001). Adaptive choice of bootstrap sample sizes. InState of the art in probability and statistics (Leiden, 1999), Vol 36 ofIMS Lecture Notes Monograph Series, pp. 286–309. Institute of Mathematical Statitics, Beachwood, OH
Greenshtein, E. (2006). Best subset selection, persistence in high-dimensional statistical learning and optimization under ℓ1-constraintThe Annals of Statistics 34(5), To appear.
Greenshtein, E., andRitov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization.Bernoulli, 10(6):971–988.
Györfi, L., Kohler, M., Krzyzak, A., andWalk, H. (2002).A distribution-free theory of nonparametric regression, Springer Series in Statistics. Springer-Verlag, New York.
Hall, P. (1992).The bootstrap and Edgeworth expansion. Springer Series in Statistics, Springer-Verlag, New York.
Hall, P., Horowitz, J. L., andJing, B.-Y. (1995). On blocking rules for the bootstrap with dependent data.Biometrika, 82(3):561–574.
Hastie, T. J., Tibshirani, R., andFriedman, J. H. (2001).The elements of statistical learning. Springer Series in Statistics. Springer-Verlag, New York. Data mining, inference, and prediction.
Hoerl, A. E. andKennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems.Technometrics, 12(1):55–67.
Huang, J., Liu, N., Pourahmadi, M., andLiu, L. (2006). Covariance matrix selection and estimation via penalise normal likelihood.Biometrika, 93(1):85–98.
Hunter, D. R. andLi, R. (2005). Variable selection using MM algorithms.The Annals of Statistics, 33(4):1617–1642.
James, W. andStein, C. (1961). Estimation with quadratic loss. InProceedings of the 4th Berkeley Sympos. Math. Statist. and Probability, Vol. I, pp. 361–379. Univ. California Press, Berkeley, Calif.
Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis.The Annals of Statistics,29(2):295–327.
Johnstone, I. M. andLu, A. Y. (2006). Sparse principle component analysis.Journal of the American Statistical Association. To appear.
Johnstone, I. M. andSilverman, B. W. (2005). Empirical Bayes selection of wavelet thresholds.The Annals of Statistics, 33(4):1700–1752.
Kass, R. E., andRaftery, A. E. (1995). Bayes factors.Journal of the American Statistical Association, 90(430):773–795.
Kass, R. E. andWasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion.Journal of the American Statistical Association, 90(431):928–934.
Kosorok, M. andMa, S. (2006). Marginal asymptotics for the “large p, small n” paradigm: with applications to microarray data. Unpublished.
Künsch, H. R. (1989). The jackknife and the bootstrap for general stationary observations.The Annals of Statistics, 17(3):1217–1241.
Ledoit, O., andWolf, M. (2004). A well-conditioned estimator for large-dimensional coveriance matrices.Journal of Multivariate Analysis, 88(2):365–411.
Li, K.-C. (1985). From Stein's unbiased risk estimates to the method of generalized cross validation.The Annals of Statistics, 13(4):1352–1377.
Li, K.-C. (1986). Asymptotic optimality ofC L and generalized cross-validation in ridge regression with application to spline smoothing.The Annals of Statistics, 14(3):1101–1112.
Li, K.-C. (1987). Asymptotic optimality forC p ,C L , cross-validation and generalized cross-validation: discrete index set.The Annals of Statistics, 15(3):958–975.
Lugosi, G. andNobel, A. B. (1999). Adaptive model selection using empirical complexities.The Annals of Statistics, 27(6):1830–1864.
Lugosi, G., andVayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods.The Annals of Statistics, 32(1):300–55.
Mallows, C. L. (1973). Some comments onc p .Technometrics, 15(4):661–675.
Mammen, E. (1992).When Does Bootstrap Work?, Springer-Verlag, New York.
Mammen, E. andTsybakov, A. B. (1999). Smooth discrimination analysis.The Annals of Statistics, 27(6):1808–1829.
Meinshausen, N. (2005). Lasso with relaxation. Unpublished.
Nadaraya, E. A. (1964). On estimating regression.Theory of Probability and Its Applications, 10:186–190.
Parzen, E. (1962). On estimation of a probability density function and mode.The Annals of Mathematical Statistics, 33:1065–1076.
Paul, D. (2005). Asymptotics of the leading sample eigenvalues for a spiked covariance model. Unpublished.
Politis, D. N. andRomano, J. P. (1994). Large sample confidence regions based on subsamples under minimal assumptions.The Annals of Statistics, 22(4):2031–2050.
Politis, D. N. Romano, J. P., andWolf, M. (1999).Subsampling. Springer Series in Statistics. Springer-Verlag, New York.
Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: unconstrained parameterisation.Biometrika, 86(3):677–690.
Pourahmadi, M. (2000). Maximum likelihood estimation of generalised linear models for multivariate normal covariance matrix.Biometrika, 87(2):425–435.
Rissanen, J. (1984). Universal coding, information, prediction, and estimationInstitute of Electrical and Electronics Engineers. Transactions on Information Theory, 30(4):629–636.
Robert, C. P. andCasella, G. (2004).Monte Carlo statistical methods. Springer Texts in Statistics. Springer-Verlag, New York, 2nd ed.
Rosenblatt, M. (1956). Remarks on some nonparametric estimates of a density function.The Annals of Mathematical Statistics, 27:832–837.
Schwarz, G. (1978). Estimating the dimension of a model.The Annals of Statistics, 6(2):461–464.
Shao, J. (1997). An asymptotic theory for linear model selection (with discussion).Statistica Sinica, 7(2):221–264.
Smith, M. andKohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data.Journal of the American Statistical Association, 97(460):1141–1153.
Stone, C. J., Hansen, M. H., Kooperberg, C., andTruong, Y. K. (1997). Polynomial splines and their tensor products in extended linear modeling (with discussions).The Annals of Statistics, 25(4):1371–1470.
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions (with discussions).Journal of the Royal Statistical Society. Series B, 36:111–147.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.Journal of the Royal Statistical Society. Series B, 58(1):267–288.
Tikhonov, A. N. (1943). On the stability of inverse problems.C. R. (Doklady) Acad. Sci. URSS (N.S.), 39:176–179.
Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning.The Annals of Statistics, 32(1):135–166.
Vapnik, V. N. (1998).Statistical learning theory Adaptive and Learning Systems for Signal Processing, Communications, and Control. John Wiley & Sons, New York. A Wiley-Interscience Publication.
Wachter, K. W. (1978). The strong limits of random matrix spectra for sample matrices of independent elements.The Annals of Probability, 6(1):1–18.
Wang, Y. (2004). Model selection. InHandbook of computational statistics, pp. 437–466. Springer-Verlag. Berlin.
Watson, G. S. (1964). Smooth regression analysis.Sankhyā. Series A, 26:359–372.
Wigner, E. P. (1955). Characteristic vectors of bordered matrices with infinite dimensions.Annals of Mathematics. Second Series, 62:548–564.
Wu, W. B. andPourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data.Biometrika, 90(4):831–844.
Zhang, H. H., Wahba, G., Lin, Y., Voelker, M., Ferris, M., Klein, R., andKlein, B. (2004). Variable selection and model building via likelihood basis pursuit.Journal of the American Statistical Association, 99(467):659–672.
Zhang, T. andYu, B. (2005). Boosting with early stopping: convergence and consistency.The Annals of Statistics, 33(4):1538–1579.
Zou, H. andHastie, T. J. (2005). Regularization and variable selection via the elastic net.Journal of the Royal Statistical Society. Series B, 67(2):301–320.
Additional references
Barron, A., Cohen, A., Dahmen, W., andDeVore, R. (2005). Approximation and learning by greedy algorithms. Manuscript.
Bunea, F., Tsybakov, A. B., andWegkamp, M. H. (2005). Aggregation for gaussian regression.The Annals of Statistics. Tentatively accepted.
Bunea, F., Tsybakov, A. B., andWegkamp, M. H. (2006). Aggregation and sparsity via ℓ1 penalized least squares. In H. U. Simon and G. Lugosi, eds.,Proceedings of 19th Annual Conference on Learning Theory (COLT 2006), Vol. 4005 ofLecture Notes in Artificial Intelligence, pp. 379–391. Springer-Verlag, Berlin-Heidelberg.
Juditsky, A., Nazin, A., Tsybakov, A., andVayatis, N. (2005a). Recursive aggregation of estimators by mirror descent algorithm with averaging.Problems of Information Transmission, 41(4):368–384.
Juditsky, A., Rigollet, P., andTsybakov, A. B. (2005b). Learning by mirror averaging. Preprint, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6—Paris 7. https://hal.ccsd.cnrs.fr/ccsd-00014097.
Klemelä, J. (2006). Density estimation with stagewise optimization of the empirical risk. Manuscript.
Mannor, S., Meir, R. andZhang, T. (2003). Greedy algorithms for classification—consistency, convergence rates, and adaptivity.Journal of Machine Learning Research, 4:713–742.
Mason, L., Baxter, J., Bartlett, P. L., andFrean, M. (2000). Functional gradient techniques for combining hypotheses. In A. J. Smola, P. L. Bartlett, B. Schölkopf, and D. Schuurmans, eds.,Advances in Large Margin Classifiers, pp. 221–247. MIT Press, Cambridge, MA.
Tsybakov, A. B. (2003). Optimal rates of aggregation. In B. Schölkopf and M. Warmuth, eds.,Proceedings of 16th Annual Conference on Learning Theory (COLT 2003) and 7th Annual Workshop on Kernel Machines, Vol. 2777 ofLecture Notes in Artificial Intelligence, pp. 303–313. Springer-Verlag, Berlin-Heidelberg.
Additional references
Boucheron, S., Bousquet, O., andLugosi, G. (2005). Theory of classification: a survey of some recent advances.ESAIM. Probability and Statistics, 9:323–375 (electronic).
Koltchinskii, V. (2006). 2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization.The Annals of Statistics, 34(6). To appear.
Additional references
Bousquet, O. andElisseeff, A. (2002). Stability and generalization.Journal of Machine Learning Research, 2(3):499–526.
Kutin, S., andNiyogi, P. (2002). Almost-everywhere algorithmic stability and genearalization error. Technical Report TR-2002-03, Department of Computer Science, University of Chicago, Chicago, IL.
Additional references
Rivero, C. andValdés, T. (2004). Mean based iterative procedures in linear models with general errors and grouped data.Scandinavian Journal of Statistics, 31(3):469–486.
Additional references
Antoniadis, A. andFan, J. (2001). Regularized wavelet approximations (with discussion).Journal of the American Statistical Association, 96:939–967.
Chamberlain, G. andRothschild, M. (1983). Arbitrage, factor structure, and mean-variance analysis on large asset markets.Econometrica, 51:1281–1304.
Chen, S., Donoho, D. L., andSaunders, M. A. (1998). Automatic decomposition by basis pursuit.SIAM Journal on Scientific Computting, 20:33–61.
Donoho, D. L. andJohnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage.Biometrika, 81:425–455.
Efromovich, S. (1999).Nonparametric Curve Estimation: Methods, Theory and Applications, Springer-Verlag, New York.
Fama, E. andFrench, K. (1993). Common risk factors in the returns on stocks and bonds.Journal of Financial Economics, 33:3–56.
Fan, J., Chen, Y., Chan, H. M., Tam, P., andRen, Y. (2005a). Removing intensity effects and identifying significant genes for Affymetrix arrays in MIF-suppressed neuroblastoma cells.Proceedings of the National Academy of Sciences of the United states of America, 103:17751–17756.
Fan, J. andJiang, J. (2005). Nonparametric inference for additive models.Journal of the American Statistical Association, 100:890–907.
Fan, J., Peng, H., andHuang, T. (2005b). Semilinear high-dimensional model for normalization of microarray data: a theoretical analysis and partial consistency.Journal of the American Statistical Association, 100:781–813.
Fan, J., Zhang, C. M., andZhang, J. (2001). Generalized likelihood ratio statistics and Wilks phenomenon.The Annals of Statistics, 29: 153–193.
Ross, S. (1976). The arbitrage theory of capital asset pricing.Journal of Economic Theory, 13:341–360.
Tibshirani, R., Hastie, T., Narasimhan, B., andChu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression.Proceedings of the National Academy of Sciences of the United states of America, 99:6567–6572.
Additional references
Barron, A., Birgé, L., andMassart, P. (1999). Risk bounds for model selection via penalization.Probability Theory and Related Fields, 113(3):301–413.
Belitser, E. andGhosal, S. (2003). Adaptive Bayesian inference on the mean of an infinite-dimensional normal distribution.The Annals of Statistics, 31(2):536–559.
Birgé, L. (2006). Statistical estimation with model selection (Brouwer lecture). Preprint.
Cai, T. T. andLow, M. G. (2004). An adaptation theory for nonparametric confidence intervals.The Annals of Statistics,33(5):1805–1840.
Cai, T. T. andLow, M. G. (2005). On adaptive estimation of linear functionals.The Annals of Statistics,33(5):2311–2343.
Cohen, A., Dahmen, W., Daubechies, I., andDeVore, R. (2001). Tree approximation and optimal encoding.Applied and Computational Harmonic Analysis. Time-Frequency and Time-Scale Analysis, Wavelets, Numerical Algorithms, and Applications, 11(2):192–226.
DeVore, R., Kerkyacharian, G., Picard, D., andTemlyakov, V. (2006). Approximation methods for supervised learning.Foundations of Computational Mathematics. The Journal of the Society for the Foundations of Computational Mathematics, 6(1):3–58.
DeVore, R. A. andLorentz, G. G. (1993).Constructive approximation, Vol. 303 ofGrundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin.
Ghosal, S., Ghosh, J. K., andvan der Vaart, A. W. (2000). Convergence rates of posterior distributions.The Annals of Statistics, 28(2):500–531.
Ghosal, S., Lember, J., andVan Der Vaart, A. (2003). On Bayesian adaptation.Acta Applicandae Mathematicae. An International Survey Journal on Applying Mathematics and Mathematical Applications, 79(1–2):165–175.
Ghosal, S. andvan der Vaart, A. W. (2006). Convergence rates of posterior distributions for noniid observations.The Annals of Statistics, 34.
Hoffmann, M. andLepski, O. (2002). Random rates in anisotropic regression.The Annals of Statistics, 30(2):325–396.
Huang, T.-M. (2004). Convergence rates for posterior distributions and adaptive estimation.The Annals of Statistics, 32(4):1556–1593.
Juditsky, A. andLambert-Lacroix, S. (2003). Nonparametric confidence set estimation.Mathematical Methods of Statistics, 12(4):410–428 (2004).
Juditsky, A., Nazin, A., Tsybakov, A., andVayatis, N. (2005). Recursive aggregation of estimators by mirror descent algorithm with averaging.Problems of Information Transmission, 41(4):368–384.
Juditsky, A. andNemirovski, A. (2000). Functional aggregation for nonparametric regression.The Annals of Statistics, 28(3):681–712.
Keles, S., van der Laan, M., andDudoit, S. (2004). Asymptotically optimal model selection method with right censored outcomes.Bernoulli. Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, 10(6):1011–1037.
Kerkyacharian, G. andPicard, D. (2004). Entropy, universal coding, approximation, and bases properties.Constructive Approximation. An International Journal for Approximations and Expansions, 20(1):1–37.
Lember, J. (2004). On Bayesian adaptation. Preprint.
Li, L., Tchetgen, E., Robins, J., andvan der Vaart, A. (2005). Robust inference with higher order influence functions: Parts I and II. Joint Statistical Meetings, Minneapolis, Minnesota.
Murphy, S. A. andvan der Vaart, A. W. (2000). On profile likelihood.Journal of the American Statistical Association, 95(450):449–485.
Nemirovski, A. (2000). Topics in non-parametric statistics. InLectures on probability theory and statistics (Saint-Flour, 1998), Vol. 1738 ofLecture Notes in Mathematics, pp. 85–277, Springer-Verlag, Berlin.
Robins, J. M. (1997). Causal inference from complex longitudinal data. In M. Berkane, ed.,Latent variable modeling and applications to causality (Los Angeles, CA, 1994), Vol. 120 ofLecture Notes in Statistics, pp. 69–117. Springer-Verlag, New York.
Robins, J. M. andvan der Vaart, A. W. (2006). Adaptive nonparametric confidence sets.The Annals of Statistics, 34(1):229–253.
van der Laan, M. J. andRobins, J. M. (2003).Unified methods for censored longitudinal data and causality. Springer Series in Statistics. Springer-Verlag, New York.
van der Vaart, A. W. (2002). Semiparametric statistics. InLectures on probability theory and statistics (Saint-Flour, 1999), Vol. 1781 ofLecture Notes in Mathematics, pp. 331–457, Springer-Verlag, Berlin.
Yang, Y. (2000). Mixing strategies for density estimation.The Annals of Statistics, 28(1):75–87.
Yang, Y. (2004). Aggregating regression procedures to improve performance.Bernoulli. Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, 10(1):25–47.
Additional references
Bickel, P. J. andFreedman, D. A. (1981). Some asymptotic theory for the bootstrap.The Annals of Statistics, 9(6):1196–1217.
Bickel, P. J. andRitov, Y. (2000). Non- and semiparametric statistics: compared and contrasted.Journal of Statistical Planning and Inference, 91(2):209–228. Prague Workshop on Perspectives in Modern Statistical Inference: Parametrics, Semi-parametrics, Non-parametrics (1998).
Cox, D. D. (1993). An analysis of Bayesian inference for nonparametric regression.The Annals of Statistics, 21(2):903–923.
Devroye, L. P. andWagner, T. J. (1979). Distribution-free performance bounds for potential function rules.Institute of Electrical and Electronics Engineers. Transactions on Information Theory, 25(5):601–604.
Freedman, D. (1999). On the Bernstein-von Mises theorem with infinite-dimensional parameters.The Annals of Statistics, 27(4):1119–1140.
Gray, H. L. andSchucany, W. R. (1972).The generalized jackknife statistic, Vol. 1 ofStatistics Textbooks and Monographs, Marcel Dekker Inc., New York.
Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J., andStahel, W. A. (1986).Robust statistics. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics, John Wiley & Sons Inc., New York. The approach based on influence functions.
Hodges, J. L., Jr. (1967). Efficiency in normal samples and tolerance of extreme values for some estimates of location. InProc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), pp. Vol. I: Statistics, pp. 163–186. Univ. California Press, Berkeley, Calif.
Huber, P. J. (1964). Robust estimation of a location parameter.Ann. Math. Statist., 35:73–101.
Kleijn, B. andvan der Vaart, A. (2005). The Bernstein-Von-Mises theorem under misspecification. Unpublished.
Koltchinskii, V. (2006). 2004 IMS Medallion Lecture: Local Rademacher complexities and oracle inequalities in risk minimization.The Annals of Statistics, 34(6). To appear.
Li, L., Techtgen, E., Robins, J., andvan der Vaart, A. (2005). Robust inference with higher order influence functions: Parts I and II. Joint Statistical Meetings, Minneapolis, Minnesota.
van der Vaart, A. W. (1998).Asymptotic statistics, Vol. 3 ofCambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge.
Wahba, G. (1990).Spline models for observational data, Vol. 59 ofCBMS-NSF Regional Conference Series in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Bickel, P.J., Li, B., Tsybakov, A.B. et al. Regularization in statistics. Test 15, 271–344 (2006). https://doi.org/10.1007/BF02607055
Issue Date:
DOI: https://doi.org/10.1007/BF02607055
Key Words
- Regularization
- linear regression
- nonparametric regression
- boosting
- covariance matrix
- principal component
- bootstrap
- subsampling
- model selection