Validation Benchmarks and Related Metrics
This chapter proposes benchmarking as an important, versatile and promising method in the process of validating simulation models with an empirical target. This excludes simulation models which only explore consequences of theoretical assumptions. A conceptual framework and descriptive theory of benchmarking in simulation validation is developed. Sources of benchmarks are outstanding experimental or observational data, stylized facts or other characteristics of the target. They are outstanding because they are more effective, more reliable or more efficient than other such data, stylized facts or characteristics. Benchmarks are set in a benchmarking process which offers a pathway to support the establishment of norms and standards in simulation validation. Benchmarks are indispensable in maintaining large simulation systems, e.g. for automatic quality checking of large-scale forecasts and when forecasting system upgrades are made.
KeywordsValidation benchmarks Touchstone Yardstick Engineering reference standard Benchmarking Benchmarking metrics
The author thanks Claus Beisbart and William Oberkampf for helpful discussions concerning this manuscript.
- Brandenburger, A. M., & Nalebuff, B. J. (1998). Co-opetition: A revolutionary mindset that combines competition and co-operation. New York: Currency Doubleday.Google Scholar
- Caldwell, S., & Morrison, R. J. (2000). Validation of longitudinal dynamic microsimulation models. Experience with CORSIM and DYNACAN. In L. Mitton, H. Sutherland & M. J. Weeks (Eds.), Microsimulation modelling for policy analysis. Challenges and innovations (pp. 200–225). Cambridge: Cambridge University Press.Google Scholar
- Foucault, M. (2008). The birth of biopolitics: Lectures at the College de France, 1978–1979. Basingstoke: Palgrave Macmillan.Google Scholar
- Harding, A., Keegan, M., & Kelly, S. (2010). Validating a dynamic population microsimulation model: Recent experience in Australia. International Journal of Microsimulation, 3, 46–64.Google Scholar
- Hoffman, F.M., et al. (2017). International land model benchmarking (ILAMB) 2016 Workshop Report. DOE/SC-0186, U.S. Department of Energy, Office of Science, Germantown, Maryland, USA. https://doi.org/10.2172/1330803.
- Jolliffe, I. T., & Stephenson, D. B. (Eds.). (2011). Forecast verification: A practitioner’s guide in atmospheric science. Sussex/Oxford: Wiley-Blackwell.Google Scholar
- Liu, Y., Chen, W., Arendt, P., & Huang, H. -Z. (2011). Towards a better understanding of model validation metrics. Journal of Mechanical Design, 133.Google Scholar
- Nambiar, R., et al. (2014). TPC state of the council 2013. In R. Nambiar & M. Poess (Eds.), Performance characterization and benchmarking, TPCTC 2013 (pp. 1–15). Cham: Springer.Google Scholar
- Oreskes, N. (2003). The role of quantitative models in science. In C. D. Canham, J. J. Cole, & W. K. Lauenroth (Eds.), Models in ecosystem science (pp. 13–31). Princeton University Press: Princeton.Google Scholar
- Perrin, C., Andreassian, V., & Michel, C. (2006). Simple benchmark models as a basis for model efficiency criteria. Arch. Hydrobiol. Suppl., 161, 221–244.Google Scholar
- Schwalm, C.R., et al. (2010). A model-data intercomparison of CO2 exchange across North America: Results from the North American Carbon program site synthesis. Journal of Geophysical Research, 115, G00H05, https://doi.org/10.1029/2009jg001229.
- Stratton, J.A., et al. (2012). Parboil: A revised benchmark suite for scientific and commercial throughput computing. IMPACT Technical Report. IMPACT-12-01. University of Illinois at Urbana-Champaign: Center for Reliable and High-Performance Computing.Google Scholar
- Vieira, M., & H. Madeira (2009). From performance to dependability benchmarking: A mandatory path. In R. Nambiar & M. Poess (Eds.), Performance evaluation and benchmarking, TPCTC 2009 (pp. 67–83). Heidelberg: Springer.Google Scholar
- Weber, M. (1978). Economy and society. Tr. by G. Roth and C. Wittich. Berkeley: University of California Press.Google Scholar
- Wilks, D. (2011). Statistical methods in the atmospheric sciences. Oxford: Elsevier.Google Scholar