Testing Simulation Models Using Frequentist Statistics
One approach to validating simulation models is to formally compare model outputs with independent data. We consider such model validation from the point of view of Frequentist statistics. A range of estimates and tests of goodness of fit have been advanced. We review these approaches, and demonstrate that some of the tests suffer from difficulties in interpretation because they rely on the null hypothesis that the model is similar to the observations. This reliance creates two unpleasant possibilities, namely, a model could be spuriously validated when data are too few, or inappropriately rejected when data are too many. Finally, these tests do not allow a principled declaration of what a reasonable level of difference would be considering the purposes to which the model will be put. We consider equivalence tests, and demonstrate that they do not suffer from the previously identified shortcomings. We provide two case studies to illustrate the claims of the chapter.
KeywordsEquivalence testing Null hypothesis significance testing Statistical models Model validation
This study is supported in part by the Centre of Excellence for Biosecurity Risk Analysis, School of BioSciences, University of Melbourne, Australia. Thoughtful review comments by Lori Dalton, Steve Lane, Anca Hanea, James Camac, and the two editors have greatly improved this chapter.
- Capes, H., et al. (2017). The allometric quarter-power scaling model and its applicability to Grand fir and Eucalyptus trees. Journal of Agricultural, Biological, and Environmental Statistics, $7$, 1–23.Google Scholar
- Freese, F. (1960). Testing accuracy. Forest Science, 6(2), 139–145.Google Scholar
- Gregoire, T. G., & Reynolds, M. R, Jr. (1988). Accuracy testing and estimation alternatives. Forest Science, 34(2), 302–320.Google Scholar
- Kleijnen, J. P. C., Bettonvil, B., & Van Groenendaal, W. (1998). Validation of trace-driven simulation models: A novel regression test. Management Science, 44(6), 812–819.Google Scholar
- R Core Team (2017). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Google Scholar
- Reynolds, M. R, Jr. (1984). Estimating the error in model predictions. Forest Science, 30(2), 454–469.Google Scholar
- Reynolds, M. R, Jr., Burkhart, H. E., & Daniels, R. F. (1981). Procedures for statistical validation of stochastic simulation models. Forest Science, 27(2), 349–364.Google Scholar
- Robinson, A. (2016). Equivalence: Provides tests and graphics for assessing tests of equivalence. R package version 0.7.2.Google Scholar
- Wellek, S. (2010). Testing statistical hypotheses of equivalence and noninferiority (2nd ed.). Chapman and Hall/CRC.Google Scholar
- Wykoff, W., Crookston, N., & Stage, A. (1982). User’s guide to the stand prognosis model. USDA Forest Service Intermountain Research Station, Ogden, UT. GTR-INT 133, 113 p.Google Scholar