UKPDS Outcomes Model 2: a new version of a model to simulate lifetime health outcomes of patients with type 2 diabetes mellitus using data from the 30 year United Kingdom Prospective Diabetes Study: UKPDS 82
- First Online:
The aim of this project was to build a new version of the United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model (UKPDS-OM1), a patient-level simulation tool for predicting lifetime health outcomes of people with type 2 diabetes mellitus.
Data from 5,102 UKPDS patients from the 20 year trial and the 4,031 survivors entering the 10 year post-trial monitoring period were used to derive parametric proportional hazards models predicting absolute risk of diabetes complications and death. We re-estimated the seven original event equations and estimated new equations for diabetic ulcer and some second events. The additional data permitted inclusion of new risk factor predictors such as estimated GFR. We also developed four new equations for all-cause mortality. Internal validation of model predictions of cumulative incidence of all events and death was carried out and a contemporary patient-level dataset was used to compare 10 year predictions from the original and the new models.
Model equations were based on a median 17.6 years of follow-up and up to 89,760 patient-years of data, providing double the number of events, greater precision and a larger number of significant covariates. The new model, UKPDS-OM2, is internally valid over 25 years and predicts event rates for complications, which are lower than those from the existing model.
The new UKPDS-OM2 has significant advantages over the existing model, as it captures more outcomes, is based on longer follow-up data, and more comprehensively captures the progression of diabetes. Its use will permit detailed and reliable lifetime simulations of key health outcomes in people with type 2 diabetes mellitus.
KeywordsComplications Life expectancy Patient-level simulation Risk modelling Survival Type 2 diabetes mellitus
Congestive heart failure
Ischaemic heart disease
Lipids in Diabetes Study
Outcomes Model version 1
Outcomes Model version 2
Peripheral vascular disease
Quality-adjusted life years
United Kingdom Prospective Diabetes Study
Computer simulation is a method of modelling the progression of type 2 diabetes mellitus and predicting long-term outcomes of the disease. Since the publication of the first United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model (UKPDS-OM1) , the use of simulation modelling in diabetes has increased, with at least eight models in use worldwide [2, 3], many of which use the published equations from UKPDS-OM1 . UKPDS-OM1 is a multi-application model and has been used in a wide variety of applications, including cost-effectiveness analyses [4, 5] and prediction of life expectancy .
The UKPDS-OM1 has been tested alongside several other diabetes simulation models at Mount Hood Challenge meetings. A general conclusion was that the models performed reasonably well in terms of predicting the relative risk of interventions vs control treatments, but less well in predicting absolute risk . Additionally, a temporal validation study found that UKPDS-OM1 over-predicted the probability of events for high-risk subgroups .
Model building is an iterative process and models need to be updated as new information becomes available . The UKPDS-OM1 was based on trial data collected in the UKPDS up until 1997. Additional information collected during the UKPDS 10 year post-trial monitoring (PTM) period  provided an opportunity to update the simulation model and to incorporate data on new risk factors and outcomes that were unavailable when the UKPDS-OM1 was constructed.
Our aim was to build a new model, Outcomes Model version 2 (UKPDS-OM2), based on the larger dataset which included additional more recent data that were also more clinically relevant, as participants were no longer in a clinical trial. This involved: (1) re-estimating, over a longer duration of follow-up, the seven original risk equations for complications (myocardial infarction [MI], ischaemic heart disease [IHD], stroke, congestive heart failure [CHF], amputation, blindness and renal failure); (2) estimating new equations, not in the original model, for diabetic ulcer and some second events; (3) developing new equations for all-cause mortality; (4) exploring the use of new risk factors such as microalbuminuria which have been shown to be predictive of diabetes-related complications.
We also present internal validation of the UKPDS-OM2 over 25 years of follow-up, carry out a sensitivity analysis and compare predictions from the original and new models using a contemporary patient-level input dataset.
Derivation of risk model equations
Study subjects and measurement of outcomes
Model equations were based on patient-level data for the 5,102 UKPDS participants with newly diagnosed type 2 diabetes mellitus, aged 25–65 years, recruited between 1977 and 1991 . These patients were followed until the trial concluded in 1997. All 4,031 surviving participants entered the 10 year PTM observational study , during which time they returned to their community or hospital-based diabetes care providers, with no attempt made to maintain their previous allocated trial regimen. All patients provided written informed consent. Approval was obtained from the ethics committees at all 23 clinical centres, and the study conformed to the Declaration of Helsinki guidelines.
During the main trial, patients were seen three or four times each year in UKPDS clinics. During PTM, patients were seen annually for 5 years in UKPDS clinics, with continued standardised collection of outcome data plus clinical examination every 3 years. In years 6–10, patient and general practitioner questionnaires were used to follow patients remotely, since funding for clinic visits was not available. The vital status of all patients who were still living in the UK was obtained from the Office for National Statistics.
Outcomes were adjudicated exactly as in the original trial, by the UKPDS endpoint committee, which was blinded to study groups. The definition of the outcomes used in the UKPDS-OM2 match adjudicated trial endpoints and the original UKPDS-OM1 outcomes, except for vascular cardiac events, where CHF and other IHD now include both fatal and non-fatal events. Additional outcomes of diabetic ulcer of the lower limb, and second events for MI, stroke and amputation, were derived. Definitions of outcomes by ICD-9 are detailed in electronic supplementary material (ESM) Table 1.
Clinical risk factors
We used a set of clinical risk factors as candidate predictor variables that were similar to those used in UKPDS-OM1 (i.e. systolic blood pressure [SBP], HbA1c, lipids) but with the following modifications: HDL and LDL cholesterol were included separately; BMI, peripheral vascular disease (PVD) and atrial fibrillation used updated rather than just baseline values. We also included risk factors shown in recent studies to be potentially predictive of diabetic complications: micro- or macroalbuminuria , estimated GFR (eGFR) , heart rate  and white blood cell count . Haemoglobin was also included, as it has been shown to be an independent predictor of mortality in patients with CHF .
Risk equations for first occurrence of eight diabetes complications and three additional second event equations for MI, stroke and amputation were developed. Multivariate semi-parametric proportional hazards survival models were derived with time to event determined in continuous time from onset of diabetes, using the censor date of death or the date of last contact with the patient. The set of candidate covariates for each equation included time invariant factors (e.g. sex, age at diagnosis of diabetes), time varying clinical risk factors (e.g. HbA1c and SBP) and time varying comorbidities (e.g. history of stroke). One year lagged values were used for clinical risk factors to avoid possible confounding of risk factors measured post complication. A full description of risk factors covariates is presented in ESM Table 2.
Risk equations for all-cause mortality were developed to take account of patients’ complication status in different years. These included logistic models to capture the high mortality in the year of a complication, and Gompertz proportional hazards survival models for years in which there were no complications. Thus, in any patient-year, only one of the four mutually exclusive equations for prediction of absolute annual risk of mortality would be used. Preliminary analysis showed that all complications except blindness and ulcer were associated with mortality in the current year (p < 0.05). Hence, logistic regression models were used to estimate the probability of death in the year of any MI, stroke, amputation (first or second), CHF, IHD or renal failure. As a result of testing to best fit and to maximise transparency, we derived two separate logistic equations for patients with and without a history of complications.
The two remaining equations for death are multivariate Gompertz proportional hazards survival models to estimate the hazard of death in years without any of the complications defined above. Time to death was determined in continuous time, with time at risk modelled by a patient’s current age in order to allow extrapolation beyond the observed follow-up period . The censor date for deaths was 30 September 2007, the date of linkage to the national mortality database from the Office for National Statistics, or the date of emigration from the UK, which represents date lost to follow-up in the national statistics.
Proportional hazards models for complications and death were derived using a consistent process to select significant covariates from the candidate risk predictors. First, binary covariates were excluded from particular event equations if cross-tabulations indicated that they occurred rarely (fewer than ten occurrences). Then a multivariate model was fitted with all remaining covariates, and any not significant at p > 0.3 were dropped. The significant covariates of the final risk model were selected in a backwards stepwise regression at p < 0.05. The parametric form of the underlying hazard was examined graphically and models were chosen by consideration of Akaike’s information criterion for exponential, Weibull and Gompertz parametric forms. The proportional hazards assumption was tested by examination of Schoenfeld residuals  in comparable Cox models and through Cox–Snell semi-log plots. If the effect of any covariate was identified as non-linear, it was modelled either as a categorical variable or as a continuous spline function with suitable knot points. We specifically investigated any U-shaped HbA1c effect  using continuous splines. All analyses were carried out using Stata version 12.0 software (Stata, College Station, TX, USA).
Handling uncertainty and heterogeneity
Modelled outcomes are subject to several sources of uncertainty, which are important to report . Two forms of uncertainty are addressed within UKPDS-OM2: (1) Monte Carlo or ‘first order’ uncertainty arises as a result of comparing probabilities from risk equations against a random number to determine whether events take place at a patient level. Thus, in any model cycle, two identical patients may have different outcomes due to chance. We minimise this uncertainty by using large numbers of Monte Carlo replications until the mean of the outcome of interest is stable: (2) Parameter or ‘second order’ uncertainty in the estimated coefficients of the equations arises as a result of natural variation in the patient sample and limitations in the sample size for deriving the equations. Model parameters cannot be known with certainty but only within a certain parameter distribution. We captured parameter uncertainty by bootstrapping (with replacement) the UKPDS patient-level data and re-estimating all equations to derive sets of fully correlated regression coefficients. Parameter uncertainty was then propagated by using, in turn, these sets of regression coefficients to estimate different outcomes, thus providing a distribution from which CIs can be derived. This approach conforms to the American Diabetes Association guidelines on computer simulation modelling in diabetes .
Patient heterogeneity is reflected through individual patient-level simulation, where it is possible to simulate whole populations, one patient at a time, and aggregate their outcomes. Each individual has a unique set of risk factors for estimation of their probability of events. Simulations presented in this manuscript use real data on 5,102 (UKPDS) and 3,984 (Lipids in Diabetes Study [LDS]) unique patients.
Internal validation of the model using the UKPDS trial population
Internal validation is a necessary step in the development of a model, providing confidence that model equations have been correctly specified and coded . We carried out internal validation of the simulation model by testing its performance in replicating the incidence of complications and mortality over 25 years of follow-up. This involved using the observed clinical risk factor profiles of all 5,102 UKPDS patients over 25 years, with risk factors carried forward when missing or at the end of follow-up. We compared simulated cumulative failure of each of the major outcomes of the model with the observed (Kaplan–Meier) cumulative failure of events under the assumption adopted in many clinical studies that death as well as date of last contact are censoring events.
External validation, which tests model output against independent data, is beyond the scope of this manuscript and will be fully addressed in future publications. We present, instead, simulated outcomes using an external patient-level dataset as inputs to check on the consistency and face validity of the model.
Comparisons of outcomes from UKPDS-OM1 and UKPDS-OM2 simulations
We compared UKPDS-OM1 and UKPDS-OM2 predictions using as model inputs data on 3,984 patients with non-missing risk factors from a contemporary external dataset, the LDS . We used both models to predict 10 year cumulative event rates and remaining life expectancy for selected age groups. There are no observed event rates for the LDS, due to the study being stopped early. Given the illustrative nature of these applications we assumed clinical risk factors to remain constant over the 10 years and did not apply a discount rate. For comparisons of life expectancy, we ran both models to age 100 for selected age groups: 50–54 years, 60–64 years and 70–74 years. Point estimates of life expectancy were derived from 1,000 Monte Carlo replications and 95% CIs were determined from non-parametric bootstrapping.
We carried out one-way deterministic sensitivity analysis to increase understanding of the relationship between model inputs and outputs [20, 22] and to determine the relative importance of patient characteristics in driving aggregate outcomes of life expectancy. Using patient-level data from the LDS as inputs, we investigated the impact on remaining life expectancy of individually changing continuous risk factors by ±1 SD of the mean and of doubling and halving the rates of binary variables such as smoking.
Number of events and average event rates observed during UKPDS and PTM for 5,102 participants with newly diagnosed type 2 diabetes mellitus
Total number of events
Annual event ratea
We observed many linkages between events (e.g. having a history of IHD increases the probability of having an MI), shown schematically in ESM Fig. 1. The new model has more linkages between equations: in UKPDS-OM1 there were only five linkages across seven event equations, whilst in UKPDS-OM2 there are 15 linkages between the same seven equations (ESM Table 3).
In general, there were more significant covariates in the new set of event equations. Comparing the seven common event equations across both models, UKPDS-OM1 equations had approximately five, whereas UKPDS-OM2 equations have a mean of 11 covariates per equation (ESM Table 3). The new risk factors such as eGFR and micro- or macroalbuminuria were associated with a number of outcomes (ESM Fig. 1), including several types of vascular events (e.g. MI). White blood cell count, an indicator of inflammation, was also associated with a wide range of complications (MI, stroke, blindness, amputation and renal failure). A description of the risk equation covariates, including units, transformations and interpretation of hazard ratios, is presented in ESM Table 2. All fully specified risk equations, including constants, significant coefficients and standard errors are provided in ESM Tables 4–6, and worked examples of how to calculate the absolute risk of an event occurring are in the ESM text.
A diagram summarising the four mutually exclusive death equations and how they relate to patients’ complications status in different years is presented in ESM Fig. 2. Smoking was a significant predictor in three of the four mortality equations, but the classic risk factors HbA1c and SBP were not independently significant predictors of mortality.
Comparison of UKPDS-OM1 and UKPDS-OM2
Comparisons of simulated percentagea of patients with events at 10 years from UKPDS-OM1 and UKPDS-OM2
(n = 555)
(n = 819)
(n = 578)
(n = 3,984)
First MI (%)
Second MI (%)
First stroke (%)
Second stroke (%)
Renal failure (%)
First amputation (%)
Second amputation (%)
Heart failure (%)
Simulated life expectancy (95% CI) for three age cohorts using patient-level data from the LDS cohort
Duration of diabetes
Remaining life expectancy (95% CI) in years
We have developed a substantially enhanced UKPDS outcomes model that uses an additional 38,000 patient-years of observational data (primarily from the PTM period), almost doubling the follow-up time used to develop the original model. During the extra follow-up many additional complications were observed, including second events, as the patients were older and had a longer duration of diabetes; this was also possibly because the patients were no longer participating in a clinical trial. The new outcomes model has a number of important enhancements: re-estimation of the event equations to include many additional risk factors; inclusion of an additional outcome of lower extremity ulcer; prediction of second events for MI, stroke and amputation; inclusion of additional linkages between complications; and substantial changes to modelling mortality. The greater number of linkages and the greater number of significant clinical risk factors in the equations compared with UKPDS-OM1 reflects the greater statistical power of a much larger dataset. For example, in the original model, being diagnosed with IHD elevated the subsequent risk of MI; but, in UKPDS-OM2, IHD also elevated the risk of stroke and blindness. We note that these interrelationships between complications may be due to other common factors not currently captured in the model.
Internal validation demonstrated a high degree of consistency between simulated and observed events over a long time period. This reliable representation of the epidemiology of diabetes complications is pertinent to future use of the model in cost-effectiveness analyses, as outcomes are usually modelled over a lifetime horizon. Sensitivity analysis confirmed the importance of the classic risk factors in driving model outcomes, but also demonstrated the importance of the new risk factors eGFR, micro- and macroalbuminuria, heart rate and white blood cells. The relative importance of clinical risk factors in predicting life expectancy depends on the number of risk equations in which they are significant and their associated hazard ratios.
In the head-to-head comparison of both simulation models, UKPDS-OM2 predicted fewer macrovascular events and higher survival over a 10 year period. There are a number of explanations. UKPDS-OM1 equations were derived from shorter durations of diabetes (up to 10 years) and represent greater out-of-sample extrapolation when evaluated over longer durations of diabetes. Also, the simplifying assumption in this example, of clinical risk factors remaining at baseline levels, confers additional reductions in risk for UKPDS-OM2 that are not captured in UKPDS-OM1, but the models diverge in their predictions in year 1 when these assumptions are not made (ESM Table 7). Finally, there were some changes in definitions in some endpoints (such as the removal of vascular death from MI) that limited the degree to which results can be directly compared. Life expectancy projections were also longer using UKPDS-OM2 for all age cohorts. These are consistent with downward secular trends in cardiovascular disease and improvements in mortality and are in line with previously reported estimates of the reduction in life expectancy due to diabetes .
The major strength of our model is that it is based on data from the longest follow-up study of patients with type 2 diabetes, including both clinical trial and observational data in patients with a long duration of diabetes who were considerably older than usual clinical trial participants (up to age 90 years). The explicit modelling of second events will enable the model to be used more confidently in secondary prevention for patients with already complicated diabetes . This comprehensive model of type 2 diabetes mellitus incorporates detailed modelling at a patient level and has the capacity to inform individualised medicine and analyses for patient subgroups.
There are a number of limitations to the simulation model in its current form. While the range of modelled outcomes has been extended, complications such as hyper- and hypoglycaemic episodes are not included. These were collected during the UKPDS, but not as true frequencies beyond a small number of deaths from hypoglycaemia. By expanding the number of outcomes and the number of input risk factors, the model has increased in complexity, but in many ways this reflects the nature of a disease that is characterised by so many different complications, the occurrence of which often is determined by interrelationships between risk factors and the patient’s clinical history. Finally, as we have used stepwise regression it would be useful to explore the relationship between risk factors and outcomes in other populations to test further the associations observed here.
There are a number of areas requiring further development. In its present form, the model requires individual patient time-paths of clinical risk factors or assumptions regarding the time-paths of baseline risk factors. We are currently developing models to predict these time-paths, which can be easily integrated as additional sub-models to reflect diabetes management practices in the populations of interest. We also need to demonstrate external validity and the applicability of the model to other populations such as those in South East Asia, which are known to have a different profile of complications . Further comparisons with other diabetes simulation models and stand-alone equations such as the UKPDS risk engine  will allow us to assess whether the new outcomes model has improved predictions in different diabetic populations. Finally, the use of this new outcomes model for cost-effectiveness analysis will require derivation of QALY weights for events including ulcer and second events, and estimation of costs associated with complications. These enhancements will be addressed in future work.
A common criticism of many computer simulations is that they are a ‘black box’ with users having little understanding of the underlying relationships between input values and outcomes of the model. By contrast, the UKPDS-OM2 takes a completely transparent approach [6, 26], in which we have fully reported its development, the equations that determine all outcomes and the algorithm used to bring the elements of the model together.
The model will contribute to a greater understanding of the progression of diabetes and its complications and is likely to be used widely by epidemiologists, health economists and trialists. It will play a major role in comparative effectiveness, in cost-effectiveness analyses and in the evaluation of strategies for the management of diabetes in the future.
We acknowledge the UKPDS Group (see ESM). We thank P. Kelly (University of Sydney), B. Mihalova and M. Alva (both University of Oxford) for helpful discussions, and R. Coleman and I. Kennedy (Diabetes Trials Unit) for providing information on the UKPDS clinical trial data.
A. J. Hayes and P. M. Clarke acknowledge the support of Australian National Health and Medical Research Council project grant no. 512463 and capacity building grant no. 571372. A. M. Gray, J. Leal and R. R. Holman acknowledge the support of a UK Medical Research Council project grant on disease modelling (grant ID: 87386). R. R. Holman and A. M. Gray are NIHR senior investigators.
Duality of interest
The authors declare that there is no duality of interest associated with this manuscript.
AJH carried out statistical analyses, designed and programmed the model and drafted the manuscript; JL contributed to statistical analyses, carried out UKPDS-OM1 simulations and contributed to aspects of the manuscript; AMG contributed to the design of the study and aspects of the manuscript; RRH provided all the clinical data and provided clinical input on design of the study; PMC conceived the study, gave statistical and modelling advice and drafted the manuscript. All authors critically revised earlier drafts of the manuscript. All authors approved the final version of the manuscript.
- 5.Simon J, Gray AM, Clarke P, Wade A, Neil A, Farmer A (2006) Cost-effectiveness of self-monitoring of blood glucose in the management of patients with non-insulin treated type 2 diabetes: economic evaluation of data from the randomised controlled DiGEM trial. Br Med J 336:1177–1180CrossRefGoogle Scholar
- 19.Briggs AH, Weinstein MC, Fenwick EA, Karnon J, Sculpher MJ, Paltiel AD (2012) ISPOR-SMDM Modeling Good Research Practices Task Force. Model parameter estimation and uncertainty: a report of the ISPOR-SMDM Modeling Good Research Practices Task Force-6. Value Health 15:835–842PubMedCrossRefGoogle Scholar