Statistics and Computing

, Volume 22, Issue 1, pp 237–249 | Cite as

Prediction-based regularization using data augmented regression



The role of regularization is to control fitted model complexity and variance by penalizing (or constraining) models to be in an area of model space that is deemed reasonable, thus facilitating good predictive performance. This is typically achieved by penalizing a parametric or non-parametric representation of the model. In this paper we advocate instead the use of prior knowledge or expectations about the predictions of models for regularization. This has the twofold advantage of allowing a more intuitive interpretation of penalties and priors and explicitly controlling model extrapolation into relevant regions of the feature space. This second point is especially critical in high-dimensional modeling situations, where the curse of dimensionality implies that new prediction points usually require extrapolation. We demonstrate that prediction-based regularization can, in many cases, be stochastically implemented by simply augmenting the dataset with Monte Carlo pseudo-data. We investigate the range of applicability of this implementation. An asymptotic analysis of the performance of Data Augmented Regression (DAR) in parametric and non-parametric linear regression, and in nearest neighbor regression, clarifies the regularizing behavior of DAR. We apply DAR to simulated and real data, and show that it is able to control the variance of extrapolation, while maintaining, and often improving, predictive accuracy.


Regression Nearest-neighbor Extrapolation Machine learning Regularization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abu-Mostafa, Y.: Hints. Neural Comput. 7, 639–671 (1995) CrossRefGoogle Scholar
  2. Bedrick, E.J., Christensen, R., Johnson, W.: A new perspective on priors for generalized linear models. J. Am. Stat. Assoc. 91, 1450–1460 (1996) MathSciNetMATHCrossRefGoogle Scholar
  3. Bickel, P., Bo, L.: Regularization in statistics. Test, pp. 271–344 Google Scholar
  4. Bickel, P., Ritov, Y., Tsybakov, A.B.: Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37 (2009) Google Scholar
  5. Breiman, L.: Bagging predictors. Mach. Learn. 24(2) (1996) Google Scholar
  6. Breiman, L.: Statistical modeling: the two cultures. Stat. Sci. (2001) Google Scholar
  7. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth, Belmont (1984) MATHGoogle Scholar
  8. Christensen, R.: Analysis of Variance, Design and Regression: Applied Statistical Methods. Chapman and Hall, New York (1996) Google Scholar
  9. Dasarathy, B.: Nearest Neighbor Pattern Classification Techniques. IEEE Comput. Soc., Los Alamitos (1991) Google Scholar
  10. Harrison, D., Rubinfeld, D.L.: Hedonic prices and the demand for clean air. J. Environ. Econ. Manage. 5, 81–102 (1978) MATHCrossRefGoogle Scholar
  11. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning. Springer, New York (2001) MATHGoogle Scholar
  12. Hoerl, A., Kennard, R.: Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12(3), 55–67 (1970) MATHCrossRefGoogle Scholar
  13. Hooker, G.: Diagnosing extrapolation: Tree-based density estimation. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2004) Google Scholar
  14. Lehmann, EL, Casella, G.: Theory of Point Estimation. Springer, New York (1998) MATHGoogle Scholar
  15. Mammen, E., van de Geer, S.: Locally adaptive regression splines. Ann. Statist. 25(1), 387–413 (1997) MathSciNetMATHCrossRefGoogle Scholar
  16. Munson, MA, Webb, K., Sheldon, D., Fink, D., Hochachka, W.M., Iliff, M., Riedewald, M., Sorokina, D., Sullivan, B., Wood, C., Kelling, S.: The ebird reference dataset. (2009)
  17. Niyogi, P., Girosi, F., Poggio, T.: Incorporating prior information in machine learning by creating virtual examples. Proc. IEEE 86(11), 2196–2209 (1998) CrossRefGoogle Scholar
  18. Pace, R.K., Barry, R.: Sparse spatial autoregressions. Stat. Probab. Lett. 33, 291–297 (1997) MATHCrossRefGoogle Scholar
  19. R Development Core Team: R: a language and environment for statistical computing. R foundation for statistical computing, Vienna, Austria, URL (2007). ISBN 3-900051-07-0
  20. Rifkin, R.M., Lippert, R.A.: Value regularization and the fenchel duality. J. Mach. Learn. Res. 8, 441–479 (2007) MathSciNetMATHGoogle Scholar
  21. Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58(1), 267–288 (1996) MathSciNetMATHGoogle Scholar
  22. Tsutakawa, R.K., Lin, Y.H.: Bayesian estimation of item response curves. Psychometrika 51, 251–267 (1986) MathSciNetMATHCrossRefGoogle Scholar
  23. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, Berlin (1996) Google Scholar
  24. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998) MATHGoogle Scholar
  25. Wahba, G.: Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics (1990) Google Scholar
  26. Zhu, J., Hastie, T.: Kernel logistic regression and the import vector machine. J. Comput. Graph. Statist. 14, 185–205 (2005) MathSciNetCrossRefGoogle Scholar
  27. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat Soc. 67, 301–320 (2005) MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Biological Statistics and Computational BiologyCornell UniversityIthacaUSA
  2. 2.School of Mathematical SciencesTel Aviv UniversityTel AvivIsrael

Personalised recommendations