Journal of Statistical Theory and Practice

, Volume 11, Issue 3, pp 449–467 | Cite as

Statistical estimation in the presence of possibly incorrect model assumptions

  • Sergey TarimaEmail author


The estimation problem of a parameter of interest when some model assumptions may be incorrect is considered. The parameter of interest is defined in a model-independent manner and the estimating procedure selects a model with the smallest mean square error (MSE) as estimated by a proposed MSE estimator. This proposed MSE estimator combines both a nonparametric bootstrap and plug-in estimation in its structure. It requires at least one consistent estimator with a quickly disappearing systematic bias (\(\sqrt n \) mean convergence). This estimator is not tied up to a single set of model assumptions (e.g., a class of parametric models), and thus it works across various sets of possibly nonnested classes of statistical models. The derived large sample properties constitute theoretical justification of its use and allow the estimation of the probability of how likely this estimator will have the smallest MSE in a pool of candidate estimators. Multiple simulation studies illustrate the performance of the proposed procedure under various scenarios. A real data example highlights its practical use when a single model is selected from several conceptually different statistical modeling techniques (parametric regression, Cox regression, stratified Cox regression, regression on pseudo-values) and other model selection approaches are not applicable.


Parameter estimation mean square error model misspecification 

AMS Subject Classification

62-07 62N 62G99 62J99 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akaike, F. 1973. Informaion theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory, 267–81. Budapest, September.Google Scholar
  2. Bamber, D. C. 1975. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology 12:387–415.MathSciNetCrossRefGoogle Scholar
  3. Bartolucci, F., and M. Lupparelli. 2008. Focused information criterion for capture–recapture models for closed populations. Scandinavian Journal of Statistics 35:629–49.MathSciNetCrossRefGoogle Scholar
  4. Burnham, K. P., and D. R. Anderson. 2002. Model selection and multimodel inference: A practical information theoretic approach. New York, NY: Springer.zbMATHGoogle Scholar
  5. Booth, J. G., and S. Sorkar. 1998. Monte-Carlo approximation of bootstrap variance. American Statistician 52:354–57.Google Scholar
  6. Claeskens, G., and N.L. Hjort. 2003. The focused information criterion. Journal of American Statistical Association 98:900–16.MathSciNetCrossRefGoogle Scholar
  7. Claeskens, G., C. Croux, J. V. Kerckhoven. 2007. Prediction-focused model selection for autoregressive models. Australian and New Zealand Journal of Statistics 49:359–79.MathSciNetCrossRefGoogle Scholar
  8. Cook, D., and L. Li. 2003. Discussion. Journal of American Statistical Association 98:925–28.CrossRefGoogle Scholar
  9. Copelan, E. A., J. C. Biggs, J. M. Thompson, P. Crilley, J. Szer, J. P. Klein, N. Kapoor, B. R. Avalos, I. Cunningham, K. Atkinson, K. Downs, G. S. Harmon, M. B. Daly, I. Brodsky, S. I. Bulova, P. J. Tutschka. 1991. Treatment for acute meyelocytic leukemia with allogeneic bone marrow transplantation following preparation with Bu/Cy. Blood 78:838–43.CrossRefGoogle Scholar
  10. Efron, B. 1981. Censored data and the bootstrap. Journal of American Statistical Association 76:312–19.MathSciNetCrossRefGoogle Scholar
  11. Gruber, S., and M. J. Van der Laan. 2012. tmle: An R package for targeted maximum likelihood estimation. Journal of Statistical Software 51:1–35.CrossRefGoogle Scholar
  12. Hjort, N. L., and G. Claeskens. 2003. Frequentist model average estimators. Journal of American Statistical Association 98:879–99.MathSciNetCrossRefGoogle Scholar
  13. Kaplan, E. L., and P. Meier. 1958. Nonparametric estimator from incomplete observations. Journal of American Statistical Association 53:457–81.MathSciNetCrossRefGoogle Scholar
  14. Koenker, R. 2005. Quantile regression. New York, NY: Cambridge University Press.CrossRefGoogle Scholar
  15. Klein, J. P., M. Gerster, P. K. Andersen, S. Tarima, and M. P. Perme. 2008. SAS and R functions to compute pseudo-values for censored data regression. Computer Methods and Programs in Biomedicine 89:289–300.CrossRefGoogle Scholar
  16. Klein, J. P., B. Logan, M. Harhoff, and P. K. Andersen. 2007. Analyzing survival curves at a fixed point in time. Statistics in Medicine 26:4505–19.MathSciNetCrossRefGoogle Scholar
  17. Klein, J. P., and M. L. Moshenberg. 2003. Survival analysis, 2nd ed. New York, NY: Springer.Google Scholar
  18. Ishwaran, H., and J. R. Sunil. 2003. Discussion. Journal of American Statistical Association 98:922–25.CrossRefGoogle Scholar
  19. Mallows, C. L. 1973. Some Comments on CP. Technometrics 15:661–75.zbMATHGoogle Scholar
  20. Schwarz, G. 1978. Estimating the dimension of a model. Annals of Statistics 6:461–64.MathSciNetCrossRefGoogle Scholar
  21. Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. V. D. Linde. 2002. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B 64:583–639.MathSciNetCrossRefGoogle Scholar
  22. Takeuchi, K. 1976. Distribution of information statistics and a criterion of model fitting (in Japanese). Suri-Kagaku (Mathematical Sciences) 6:461–64.Google Scholar
  23. Tarassenko, F. P., S. S. Tarima, A. V. Zhuravlev, and S. Singh. 2015. On sign-based regression quantiles. Journal of Statistical Computation and Simulation 85:1420–41.MathSciNetCrossRefGoogle Scholar
  24. Van der Laan, M. J., and S. Rose. 2011. Targeted learning. New York, NY: Springer.CrossRefGoogle Scholar

Copyright information

© Grace Scientific Publishing 2017

Authors and Affiliations

  1. 1.Division of BiostatisticsMedical College of WisconsinMilwaukeeUSA

Personalised recommendations