, Volume 29, Issue 1, pp 51–62 | Cite as

Calibrating Models in Economic Evaluation

A Comparison of Alternative Measures of Goodness of Fit, Parameter Search Strategies and Convergence Criteria
Original Research Article Calibrating Models in Economic Evaluation


Background: The importance of assessing the accuracy of health economic decision models is widely recognized. Many applied decision models (implicitly) assume that the process of identifying relevant values for a model’s input parameters is sufficient to prove the model’s accuracy. The selection of infeasible combinations of input parameter values is most likely in the context of probabilistic sensitivity analysis (PSA), where parameter values are drawn from independently specified probability distributions for each model parameter. Model calibration involves the identification of input parameter values that produce model output parameters that best predict observed data.

Methods: An empirical comparison of three key calibration issues is presented: the applied measure of goodness of fit (GOF); the search strategy for selecting sets of input parameter values; and the convergence criteria for determining acceptable GOF. The comparisons are presented in the context of probabilistic calibration, a widely applicable approach to calibration that can be easily integrated with PSA. The appendix provides a user’s guide to probabilistic calibration, with the reader invited to download the Microsoft® Excel-based model reported in this article.

Results: The calibrated models consistently provided higher mean estimates of the models’ output parameter, illustrating the potential gain in accuracy derived from calibrating decision models. Model uncertainty was also reduced. The chi-squared GOF measure differentiated between the accuracy of different parameter sets to a far greater degree than the likelihood GOF measure. The guided search strategy produced higher mean estimates of the models’ output parameter, as well as a narrower range of predicted output values, which may reflect greater precision in the identification of candidate parameter sets or more limited coverage of the parameter space. The broader convergence threshold resulted in lower mean estimates of the models’ output, and slightly wider ranges, which were closer to the outputs associated with the non-calibrated approach.

Conclusions: Probabilistic calibration provides a broadly applicable method that will improve the relevance of health economic decision models, and simultaneously reduce model uncertainty. The analyses reported in this paper inform the more efficient and accurate application of calibration methods for health economic decision models.



No sources of funding were used to conduct this study or prepare this manuscript. The authors have no conflicts of interest that are directly relevant to the content of this article.

Supplementary material

40273_2012_29010051_MOESM1_ESM.xls (2.6 mb)
Supplementary material, approximately 2.63 MB.


  1. 1.
    Department of Health and Ageing. PBAC guidelines: guidelines for preparing submissions to the Pharmaceutical Benefits Advisory Committee (PBAC) [version 4.3]. Woden (ACT): PBAC, 2008 [online]. Available from URL: [Accessed 2010 Sep 29]Google Scholar
  2. 2.
    National Institute for Health and Clinical Excellence. Guide to the methods of technology appraisal (reference N0515). London: NICE, 2004 [online]. Available from URL: [Accessed 2010 Sep 29]Google Scholar
  3. 3.
    Philips Z, Ginnelly L, Sculpher M, et al. Review of guidelines for good practice in decision-analytic modelling in health technology assessment. Health Technol Assess 2004; 8 (36): 1–172PubMedGoogle Scholar
  4. 4.
    Weinstein MC. Recent developments in decision-analytic modeling for economic evaluation. Pharmacoeconomics 2006; 24 (11): 043–53CrossRefGoogle Scholar
  5. 5.
    Vanni T, Karnon J, Madan J, et al. Calibrating models in economic evaluation: a seven-step approach. Pharmacoeconomics 2011; 29 (1): 35–49CrossRefPubMedGoogle Scholar
  6. 6.
    Karnon J, Delea TE, Barghout V. Cost utility analysis of early adjuvant letrozole or anastrozole versus tamoxifen in postmenopausal women with early invasive breast cancer: the UK perspective. Eur J Health Econ 2008; 9: 171–83CrossRefPubMedGoogle Scholar
  7. 7.
    Karnon J, McIntosh A, Dean J, et al. A prospective hazard and improvement analytic approach to predicting the effectiveness of medication error interventions. Saf Sci 2007; 45: 523–39CrossRefGoogle Scholar
  8. 8.
    Carlton J, Karnon J, Czoski-Murray C, et al. The clinical effectiveness and cost-effectiveness of screening programmes for amblyopia and strabismus in children up to the age of 4–5 years: a systematic review and economic evaluation. Health Technol Assess 2008; 12 (25): iii, xi–194Google Scholar
  9. 9.
    Karnon J, Jones R, Czoski-Murray C, et al. Cost-utility analysis of screening high risk groups for anal cancer. J Public Health 2008 Sep; 30: 293–304CrossRefGoogle Scholar
  10. 10.
    Karnon J, Campbell F, Czoski-Murray C. Model-based cost-effectiveness analysis of interventions aimed at preventing medication error at hospital admission (medicines reconciliation). J Eval Clin Pract 2009; 15 (2): 299–306CrossRefPubMedGoogle Scholar
  11. 11.
    Karnon J, Czoski Murray C, Smith KJ, et al. A hybrid cohort individual sampling natural history model of agerelatedmacular degeneration: assessing the cost-effectiveness of screening using probabilistic calibration. Med Decis Making 2009; 29: 304–16CrossRefPubMedGoogle Scholar
  12. 12.
    Karnon J. Cost considerations and cost effectiveness of aromatase inhibitors in breast cancer. Pharmacoeconomics 2006; 24 (3): 215–32CrossRefPubMedGoogle Scholar
  13. 13.
    Early Breast Cancer Trialists’ Collaborative Group. Effects of chemotherapy and hormonal therapy for early breast cancer on recurrence and 15-year survival: an overview of the randomised trials. Lancet 2005; 365: 1687–717CrossRefGoogle Scholar
  14. 14.
    The Breast International Group (BIG) 1-98 Collaborative Group. A comparison of letrozole and tamoxifen in postmenopausal women with early breast cancer. N Engl J Med 2005; 353: 2747–57CrossRefGoogle Scholar
  15. 15.
    Howell A, Cuzick J, Baum M, et al. Results of the ATAC (Arimidex, Tamoxifen, Alone or in Combination) trial after completion of 5 years’ adjuvant treatment for breast cancer. Lancet 2005; 365 (9453): 60–2CrossRefPubMedGoogle Scholar
  16. 16.
    Jit M, Gay N, Soldan K, et al. Estimating progression rates for human papillomavirus infection from epidemiological data. Med Decis Making 2010; 30: 84–98CrossRefPubMedGoogle Scholar
  17. 17.
    Frontline Systems, Inc. [online]. Available from URL: [Accessed 2010 Sep 20]
  18. 18.
    Kim JJ, Kuntz KM, Stout NK, et al. Multi-parameter calibration of a natural history model of cervical cancer. Am J Epidemiol 2007; 166: 137–50CrossRefPubMedGoogle Scholar
  19. 19.
    Ades AE, Welton NJ, Caldwell D, et al. Multi-parameter evidence synthesis in epidemiology and medical decision making. J Health Serv Res Policy 2008; 13 Suppl. 3: 12–22CrossRefPubMedGoogle Scholar

Copyright information

© Springer International Publishing AG 2011

Authors and Affiliations

  1. 1.Department of Public HealthUniversity of AdelaideAdelaideAustralia
  2. 2.London School of Hygiene and Tropical MedicineLondonUK

Personalised recommendations