Abstract
The estimation problem of a parameter of interest when some model assumptions may be incorrect is considered. The parameter of interest is defined in a model-independent manner and the estimating procedure selects a model with the smallest mean square error (MSE) as estimated by a proposed MSE estimator. This proposed MSE estimator combines both a nonparametric bootstrap and plug-in estimation in its structure. It requires at least one consistent estimator with a quickly disappearing systematic bias (\(\sqrt n \) mean convergence). This estimator is not tied up to a single set of model assumptions (e.g., a class of parametric models), and thus it works across various sets of possibly nonnested classes of statistical models. The derived large sample properties constitute theoretical justification of its use and allow the estimation of the probability of how likely this estimator will have the smallest MSE in a pool of candidate estimators. Multiple simulation studies illustrate the performance of the proposed procedure under various scenarios. A real data example highlights its practical use when a single model is selected from several conceptually different statistical modeling techniques (parametric regression, Cox regression, stratified Cox regression, regression on pseudo-values) and other model selection approaches are not applicable.
Similar content being viewed by others
References
Akaike, F. 1973. Informaion theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory, 267–81. Budapest, September.
Bamber, D. C. 1975. The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology 12:387–415.
Bartolucci, F., and M. Lupparelli. 2008. Focused information criterion for capture–recapture models for closed populations. Scandinavian Journal of Statistics 35:629–49.
Burnham, K. P., and D. R. Anderson. 2002. Model selection and multimodel inference: A practical information theoretic approach. New York, NY: Springer.
Booth, J. G., and S. Sorkar. 1998. Monte-Carlo approximation of bootstrap variance. American Statistician 52:354–57.
Claeskens, G., and N.L. Hjort. 2003. The focused information criterion. Journal of American Statistical Association 98:900–16.
Claeskens, G., C. Croux, J. V. Kerckhoven. 2007. Prediction-focused model selection for autoregressive models. Australian and New Zealand Journal of Statistics 49:359–79.
Cook, D., and L. Li. 2003. Discussion. Journal of American Statistical Association 98:925–28.
Copelan, E. A., J. C. Biggs, J. M. Thompson, P. Crilley, J. Szer, J. P. Klein, N. Kapoor, B. R. Avalos, I. Cunningham, K. Atkinson, K. Downs, G. S. Harmon, M. B. Daly, I. Brodsky, S. I. Bulova, P. J. Tutschka. 1991. Treatment for acute meyelocytic leukemia with allogeneic bone marrow transplantation following preparation with Bu/Cy. Blood 78:838–43.
Efron, B. 1981. Censored data and the bootstrap. Journal of American Statistical Association 76:312–19.
Gruber, S., and M. J. Van der Laan. 2012. tmle: An R package for targeted maximum likelihood estimation. Journal of Statistical Software 51:1–35.
Hjort, N. L., and G. Claeskens. 2003. Frequentist model average estimators. Journal of American Statistical Association 98:879–99.
Kaplan, E. L., and P. Meier. 1958. Nonparametric estimator from incomplete observations. Journal of American Statistical Association 53:457–81.
Koenker, R. 2005. Quantile regression. New York, NY: Cambridge University Press.
Klein, J. P., M. Gerster, P. K. Andersen, S. Tarima, and M. P. Perme. 2008. SAS and R functions to compute pseudo-values for censored data regression. Computer Methods and Programs in Biomedicine 89:289–300.
Klein, J. P., B. Logan, M. Harhoff, and P. K. Andersen. 2007. Analyzing survival curves at a fixed point in time. Statistics in Medicine 26:4505–19.
Klein, J. P., and M. L. Moshenberg. 2003. Survival analysis, 2nd ed. New York, NY: Springer.
Ishwaran, H., and J. R. Sunil. 2003. Discussion. Journal of American Statistical Association 98:922–25.
Mallows, C. L. 1973. Some Comments on CP. Technometrics 15:661–75.
Schwarz, G. 1978. Estimating the dimension of a model. Annals of Statistics 6:461–64.
Spiegelhalter, D. J., N. G. Best, B. P. Carlin, and A. V. D. Linde. 2002. Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B 64:583–639.
Takeuchi, K. 1976. Distribution of information statistics and a criterion of model fitting (in Japanese). Suri-Kagaku (Mathematical Sciences) 6:461–64.
Tarassenko, F. P., S. S. Tarima, A. V. Zhuravlev, and S. Singh. 2015. On sign-based regression quantiles. Journal of Statistical Computation and Simulation 85:1420–41.
Van der Laan, M. J., and S. Rose. 2011. Targeted learning. New York, NY: Springer.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Tarima, S. Statistical estimation in the presence of possibly incorrect model assumptions. J Stat Theory Pract 11, 449–467 (2017). https://doi.org/10.1080/15598608.2017.1299056
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1080/15598608.2017.1299056