UKPDS Outcomes Model 2: a new version of a model to simulate lifetime health outcomes of patients with type 2 diabetes mellitus using data from the 30 year United Kingdom Prospective Diabetes Study: UKPDS 82
Authors
Abstract
Aims/hypothesis
The aim of this project was to build a new version of the United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model (UKPDSOM1), a patientlevel simulation tool for predicting lifetime health outcomes of people with type 2 diabetes mellitus.
Methods
Data from 5,102 UKPDS patients from the 20 year trial and the 4,031 survivors entering the 10 year posttrial monitoring period were used to derive parametric proportional hazards models predicting absolute risk of diabetes complications and death. We reestimated the seven original event equations and estimated new equations for diabetic ulcer and some second events. The additional data permitted inclusion of new risk factor predictors such as estimated GFR. We also developed four new equations for allcause mortality. Internal validation of model predictions of cumulative incidence of all events and death was carried out and a contemporary patientlevel dataset was used to compare 10 year predictions from the original and the new models.
Results
Model equations were based on a median 17.6 years of followup and up to 89,760 patientyears of data, providing double the number of events, greater precision and a larger number of significant covariates. The new model, UKPDSOM2, is internally valid over 25 years and predicts event rates for complications, which are lower than those from the existing model.
Conclusions/interpretation
The new UKPDSOM2 has significant advantages over the existing model, as it captures more outcomes, is based on longer followup data, and more comprehensively captures the progression of diabetes. Its use will permit detailed and reliable lifetime simulations of key health outcomes in people with type 2 diabetes mellitus.
Keywords
Complications Life expectancy Patientlevel simulation Risk modelling Survival Type 2 diabetes mellitusAbbreviations
 CHF

Congestive heart failure
 eGFR

Estimated GFR
 IHD

Ischaemic heart disease
 LDS

Lipids in Diabetes Study
 MI

Myocardial infarction
 OM1

Outcomes Model version 1
 OM2

Outcomes Model version 2
 PTM

Posttrial monitoring
 PVD

Peripheral vascular disease
 QALYs

Qualityadjusted life years
 SBP

Systolic BP
 UKPDS

United Kingdom Prospective Diabetes Study
Introduction
Computer simulation is a method of modelling the progression of type 2 diabetes mellitus and predicting longterm outcomes of the disease. Since the publication of the first United Kingdom Prospective Diabetes Study (UKPDS) Outcomes Model (UKPDSOM1) [1], the use of simulation modelling in diabetes has increased, with at least eight models in use worldwide [2, 3], many of which use the published equations from UKPDSOM1 [3]. UKPDSOM1 is a multiapplication model and has been used in a wide variety of applications, including costeffectiveness analyses [4, 5] and prediction of life expectancy [6].
The UKPDSOM1 has been tested alongside several other diabetes simulation models at Mount Hood Challenge meetings. A general conclusion was that the models performed reasonably well in terms of predicting the relative risk of interventions vs control treatments, but less well in predicting absolute risk [3]. Additionally, a temporal validation study found that UKPDSOM1 overpredicted the probability of events for highrisk subgroups [7].
Model building is an iterative process and models need to be updated as new information becomes available [8]. The UKPDSOM1 was based on trial data collected in the UKPDS up until 1997. Additional information collected during the UKPDS 10 year posttrial monitoring (PTM) period [9] provided an opportunity to update the simulation model and to incorporate data on new risk factors and outcomes that were unavailable when the UKPDSOM1 was constructed.
Our aim was to build a new model, Outcomes Model version 2 (UKPDSOM2), based on the larger dataset which included additional more recent data that were also more clinically relevant, as participants were no longer in a clinical trial. This involved: (1) reestimating, over a longer duration of followup, the seven original risk equations for complications (myocardial infarction [MI], ischaemic heart disease [IHD], stroke, congestive heart failure [CHF], amputation, blindness and renal failure); (2) estimating new equations, not in the original model, for diabetic ulcer and some second events; (3) developing new equations for allcause mortality; (4) exploring the use of new risk factors such as microalbuminuria which have been shown to be predictive of diabetesrelated complications.
We also present internal validation of the UKPDSOM2 over 25 years of followup, carry out a sensitivity analysis and compare predictions from the original and new models using a contemporary patientlevel input dataset.
Methods
The model—UKPDSOM2
Derivation of risk model equations
Study subjects and measurement of outcomes
Model equations were based on patientlevel data for the 5,102 UKPDS participants with newly diagnosed type 2 diabetes mellitus, aged 25–65 years, recruited between 1977 and 1991 [10]. These patients were followed until the trial concluded in 1997. All 4,031 surviving participants entered the 10 year PTM observational study [9], during which time they returned to their community or hospitalbased diabetes care providers, with no attempt made to maintain their previous allocated trial regimen. All patients provided written informed consent. Approval was obtained from the ethics committees at all 23 clinical centres, and the study conformed to the Declaration of Helsinki guidelines.
During the main trial, patients were seen three or four times each year in UKPDS clinics. During PTM, patients were seen annually for 5 years in UKPDS clinics, with continued standardised collection of outcome data plus clinical examination every 3 years. In years 6–10, patient and general practitioner questionnaires were used to follow patients remotely, since funding for clinic visits was not available. The vital status of all patients who were still living in the UK was obtained from the Office for National Statistics.
Outcomes were adjudicated exactly as in the original trial, by the UKPDS endpoint committee, which was blinded to study groups. The definition of the outcomes used in the UKPDSOM2 match adjudicated trial endpoints and the original UKPDSOM1 outcomes, except for vascular cardiac events, where CHF and other IHD now include both fatal and nonfatal events. Additional outcomes of diabetic ulcer of the lower limb, and second events for MI, stroke and amputation, were derived. Definitions of outcomes by ICD9 are detailed in electronic supplementary material (ESM) Table 1.
Clinical risk factors
We used a set of clinical risk factors as candidate predictor variables that were similar to those used in UKPDSOM1 (i.e. systolic blood pressure [SBP], HbA_{1c}, lipids) but with the following modifications: HDL and LDL cholesterol were included separately; BMI, peripheral vascular disease (PVD) and atrial fibrillation used updated rather than just baseline values. We also included risk factors shown in recent studies to be potentially predictive of diabetic complications: micro or macroalbuminuria [11], estimated GFR (eGFR) [12], heart rate [13] and white blood cell count [14]. Haemoglobin was also included, as it has been shown to be an independent predictor of mortality in patients with CHF [15].
Statistical analysis
Risk equations for first occurrence of eight diabetes complications and three additional second event equations for MI, stroke and amputation were developed. Multivariate semiparametric proportional hazards survival models were derived with time to event determined in continuous time from onset of diabetes, using the censor date of death or the date of last contact with the patient. The set of candidate covariates for each equation included time invariant factors (e.g. sex, age at diagnosis of diabetes), time varying clinical risk factors (e.g. HbA_{1c} and SBP) and time varying comorbidities (e.g. history of stroke). One year lagged values were used for clinical risk factors to avoid possible confounding of risk factors measured post complication. A full description of risk factors covariates is presented in ESM Table 2.
Risk equations for allcause mortality were developed to take account of patients’ complication status in different years. These included logistic models to capture the high mortality in the year of a complication, and Gompertz proportional hazards survival models for years in which there were no complications. Thus, in any patientyear, only one of the four mutually exclusive equations for prediction of absolute annual risk of mortality would be used. Preliminary analysis showed that all complications except blindness and ulcer were associated with mortality in the current year (p < 0.05). Hence, logistic regression models were used to estimate the probability of death in the year of any MI, stroke, amputation (first or second), CHF, IHD or renal failure. As a result of testing to best fit and to maximise transparency, we derived two separate logistic equations for patients with and without a history of complications.
The two remaining equations for death are multivariate Gompertz proportional hazards survival models to estimate the hazard of death in years without any of the complications defined above. Time to death was determined in continuous time, with time at risk modelled by a patient’s current age in order to allow extrapolation beyond the observed followup period [16]. The censor date for deaths was 30 September 2007, the date of linkage to the national mortality database from the Office for National Statistics, or the date of emigration from the UK, which represents date lost to followup in the national statistics.
Proportional hazards models for complications and death were derived using a consistent process to select significant covariates from the candidate risk predictors. First, binary covariates were excluded from particular event equations if crosstabulations indicated that they occurred rarely (fewer than ten occurrences). Then a multivariate model was fitted with all remaining covariates, and any not significant at p > 0.3 were dropped. The significant covariates of the final risk model were selected in a backwards stepwise regression at p < 0.05. The parametric form of the underlying hazard was examined graphically and models were chosen by consideration of Akaike’s information criterion for exponential, Weibull and Gompertz parametric forms. The proportional hazards assumption was tested by examination of Schoenfeld residuals [17] in comparable Cox models and through Cox–Snell semilog plots. If the effect of any covariate was identified as nonlinear, it was modelled either as a categorical variable or as a continuous spline function with suitable knot points. We specifically investigated any Ushaped HbA_{1c} effect [18] using continuous splines. All analyses were carried out using Stata version 12.0 software (Stata, College Station, TX, USA).
Handling uncertainty and heterogeneity
Modelled outcomes are subject to several sources of uncertainty, which are important to report [19]. Two forms of uncertainty are addressed within UKPDSOM2: (1) Monte Carlo or ‘first order’ uncertainty arises as a result of comparing probabilities from risk equations against a random number to determine whether events take place at a patient level. Thus, in any model cycle, two identical patients may have different outcomes due to chance. We minimise this uncertainty by using large numbers of Monte Carlo replications until the mean of the outcome of interest is stable: (2) Parameter or ‘second order’ uncertainty in the estimated coefficients of the equations arises as a result of natural variation in the patient sample and limitations in the sample size for deriving the equations. Model parameters cannot be known with certainty but only within a certain parameter distribution. We captured parameter uncertainty by bootstrapping (with replacement) the UKPDS patientlevel data and reestimating all equations to derive sets of fully correlated regression coefficients. Parameter uncertainty was then propagated by using, in turn, these sets of regression coefficients to estimate different outcomes, thus providing a distribution from which CIs can be derived. This approach conforms to the American Diabetes Association guidelines on computer simulation modelling in diabetes [20].
Patient heterogeneity is reflected through individual patientlevel simulation, where it is possible to simulate whole populations, one patient at a time, and aggregate their outcomes. Each individual has a unique set of risk factors for estimation of their probability of events. Simulations presented in this manuscript use real data on 5,102 (UKPDS) and 3,984 (Lipids in Diabetes Study [LDS]) unique patients.
Internal validation of the model using the UKPDS trial population
Internal validation is a necessary step in the development of a model, providing confidence that model equations have been correctly specified and coded [8]. We carried out internal validation of the simulation model by testing its performance in replicating the incidence of complications and mortality over 25 years of followup. This involved using the observed clinical risk factor profiles of all 5,102 UKPDS patients over 25 years, with risk factors carried forward when missing or at the end of followup. We compared simulated cumulative failure of each of the major outcomes of the model with the observed (Kaplan–Meier) cumulative failure of events under the assumption adopted in many clinical studies that death as well as date of last contact are censoring events.
External validation, which tests model output against independent data, is beyond the scope of this manuscript and will be fully addressed in future publications. We present, instead, simulated outcomes using an external patientlevel dataset as inputs to check on the consistency and face validity of the model.
Comparisons of outcomes from UKPDSOM1 and UKPDSOM2 simulations
We compared UKPDSOM1 and UKPDSOM2 predictions using as model inputs data on 3,984 patients with nonmissing risk factors from a contemporary external dataset, the LDS [21]. We used both models to predict 10 year cumulative event rates and remaining life expectancy for selected age groups. There are no observed event rates for the LDS, due to the study being stopped early. Given the illustrative nature of these applications we assumed clinical risk factors to remain constant over the 10 years and did not apply a discount rate. For comparisons of life expectancy, we ran both models to age 100 for selected age groups: 50–54 years, 60–64 years and 70–74 years. Point estimates of life expectancy were derived from 1,000 Monte Carlo replications and 95% CIs were determined from nonparametric bootstrapping.
Sensitivity analysis
We carried out oneway deterministic sensitivity analysis to increase understanding of the relationship between model inputs and outputs [20, 22] and to determine the relative importance of patient characteristics in driving aggregate outcomes of life expectancy. Using patientlevel data from the LDS as inputs, we investigated the impact on remaining life expectancy of individually changing continuous risk factors by ±1 SD of the mean and of doubling and halving the rates of binary variables such as smoking.
Results
Risk equations
Number of events and average event rates observed during UKPDS and PTM for 5,102 participants with newly diagnosed type 2 diabetes mellitus
Event 
Total number of events 
Annual event rate^{a} 

Death 
2,260 
0.0252 
First MI 
1,014 
0.0113 
Second MI 
169 
0.0019 
First stroke 
504 
0.0056 
Second stroke 
78 
0.0009 
CHF 
351 
0.0039 
IHD 
749 
0.0083 
First amputation 
171 
0.0019 
Second amputation 
58 
0.0006 
Blindness 
271 
0.003 
Renal failure 
113 
0.0013 
Ulcer 
97 
0.0011 
We observed many linkages between events (e.g. having a history of IHD increases the probability of having an MI), shown schematically in ESM Fig. 1. The new model has more linkages between equations: in UKPDSOM1 there were only five linkages across seven event equations, whilst in UKPDSOM2 there are 15 linkages between the same seven equations (ESM Table 3).
In general, there were more significant covariates in the new set of event equations. Comparing the seven common event equations across both models, UKPDSOM1 equations had approximately five, whereas UKPDSOM2 equations have a mean of 11 covariates per equation (ESM Table 3). The new risk factors such as eGFR and micro or macroalbuminuria were associated with a number of outcomes (ESM Fig. 1), including several types of vascular events (e.g. MI). White blood cell count, an indicator of inflammation, was also associated with a wide range of complications (MI, stroke, blindness, amputation and renal failure). A description of the risk equation covariates, including units, transformations and interpretation of hazard ratios, is presented in ESM Table 2. All fully specified risk equations, including constants, significant coefficients and standard errors are provided in ESM Tables 4–6, and worked examples of how to calculate the absolute risk of an event occurring are in the ESM text.
A diagram summarising the four mutually exclusive death equations and how they relate to patients’ complications status in different years is presented in ESM Fig. 2. Smoking was a significant predictor in three of the four mortality equations, but the classic risk factors HbA_{1c} and SBP were not independently significant predictors of mortality.
Internal validation
Comparison of UKPDSOM1 and UKPDSOM2
Comparisons of simulated percentage^{a} of patients with events at 10 years from UKPDSOM1 and UKPDSOM2
Event type 
50–54 years 
60–64 years 
70–74 years 
All ages  

(n = 555) 
(n = 819) 
(n = 578) 
(n = 3,984)  
OM1 
OM2 
OM1 
OM2 
OM1 
OM2 
OM1 
OM2  
First MI (%) 
14.9 
7.5 
22.5 
10.3 
29.6 
13.3 
21.0 
9.9 
Second MI (%) 
n/a 
0.9 
n/a 
1.0 
n/a 
1.1 
n/a 
1.0 
Ulcer (%) 
n/a 
1.5 
n/a 
1.9 
n/a 
2.2 
n/a 
1.8 
Blindness (%) 
2.2 
2.2 
3.5 
3.1 
4.9 
3.95 
3.3 
2.9 
IHD (%) 
8.6 
6.9 
10.3 
8.3 
10.5 
9 
9.5 
7.8 
First stroke (%) 
3.3 
3.3 
7.9 
6.4 
14.2 
10.7 
7.6 
6.2 
Second stroke (%) 
n/a 
0.3 
n/a 
0.7 
n/a 
1.5 
n/a 
0.71 
Renal failure (%) 
0.9 
0.3 
1.4 
0.6 
1.6 
0.75 
1.3 
0.5 
First amputation (%) 
1.7 
1.3 
2.0 
1.6 
1.7 
1.8 
1.8 
1.5 
Second amputation (%) 
n/a 
0.4 
n/a 
0.6 
n/a 
0.4 
n/a 
0.44 
Heart failure (%) 
3.0 
2.5 
5.9 
4.3 
9.9 
6.4 
5.7 
4.0 
Death (%) 
14.5 
11.1 
32.1 
22.3 
58.8 
43.3 
31.6 
22.5 
Simulated life expectancy (95% CI) for three age cohorts using patientlevel data from the LDS cohort
Patient group 
N 
Duration of diabetes 
Remaining life expectancy (95% CI) in years  

Mean (SE) 
UKPDSOM1^{a} 
UKPDSOM2^{b}  
50–54 years 
554 
6.4 (0.21) 
20.0 (17.7–23.0) 
25.1 (24.5–25.7) 
60–64 years 
819 
8.1 (0.21) 
13.9 (12.7–16.2) 
17.7 (17.1–18.3) 
70–74 years 
578 
9.3 (0.28) 
9.1 (8.4–10.7) 
11.7 (11.2–12.2) 
Sensitivity analysis
Discussion
We have developed a substantially enhanced UKPDS outcomes model that uses an additional 38,000 patientyears of observational data (primarily from the PTM period), almost doubling the followup time used to develop the original model. During the extra followup many additional complications were observed, including second events, as the patients were older and had a longer duration of diabetes; this was also possibly because the patients were no longer participating in a clinical trial. The new outcomes model has a number of important enhancements: reestimation of the event equations to include many additional risk factors; inclusion of an additional outcome of lower extremity ulcer; prediction of second events for MI, stroke and amputation; inclusion of additional linkages between complications; and substantial changes to modelling mortality. The greater number of linkages and the greater number of significant clinical risk factors in the equations compared with UKPDSOM1 reflects the greater statistical power of a much larger dataset. For example, in the original model, being diagnosed with IHD elevated the subsequent risk of MI; but, in UKPDSOM2, IHD also elevated the risk of stroke and blindness. We note that these interrelationships between complications may be due to other common factors not currently captured in the model.
Internal validation demonstrated a high degree of consistency between simulated and observed events over a long time period. This reliable representation of the epidemiology of diabetes complications is pertinent to future use of the model in costeffectiveness analyses, as outcomes are usually modelled over a lifetime horizon. Sensitivity analysis confirmed the importance of the classic risk factors in driving model outcomes, but also demonstrated the importance of the new risk factors eGFR, micro and macroalbuminuria, heart rate and white blood cells. The relative importance of clinical risk factors in predicting life expectancy depends on the number of risk equations in which they are significant and their associated hazard ratios.
In the headtohead comparison of both simulation models, UKPDSOM2 predicted fewer macrovascular events and higher survival over a 10 year period. There are a number of explanations. UKPDSOM1 equations were derived from shorter durations of diabetes (up to 10 years) and represent greater outofsample extrapolation when evaluated over longer durations of diabetes. Also, the simplifying assumption in this example, of clinical risk factors remaining at baseline levels, confers additional reductions in risk for UKPDSOM2 that are not captured in UKPDSOM1, but the models diverge in their predictions in year 1 when these assumptions are not made (ESM Table 7). Finally, there were some changes in definitions in some endpoints (such as the removal of vascular death from MI) that limited the degree to which results can be directly compared. Life expectancy projections were also longer using UKPDSOM2 for all age cohorts. These are consistent with downward secular trends in cardiovascular disease and improvements in mortality and are in line with previously reported estimates of the reduction in life expectancy due to diabetes [23].
The major strength of our model is that it is based on data from the longest followup study of patients with type 2 diabetes, including both clinical trial and observational data in patients with a long duration of diabetes who were considerably older than usual clinical trial participants (up to age 90 years). The explicit modelling of second events will enable the model to be used more confidently in secondary prevention for patients with already complicated diabetes [7]. This comprehensive model of type 2 diabetes mellitus incorporates detailed modelling at a patient level and has the capacity to inform individualised medicine and analyses for patient subgroups.
There are a number of limitations to the simulation model in its current form. While the range of modelled outcomes has been extended, complications such as hyper and hypoglycaemic episodes are not included. These were collected during the UKPDS, but not as true frequencies beyond a small number of deaths from hypoglycaemia. By expanding the number of outcomes and the number of input risk factors, the model has increased in complexity, but in many ways this reflects the nature of a disease that is characterised by so many different complications, the occurrence of which often is determined by interrelationships between risk factors and the patient’s clinical history. Finally, as we have used stepwise regression it would be useful to explore the relationship between risk factors and outcomes in other populations to test further the associations observed here.
There are a number of areas requiring further development. In its present form, the model requires individual patient timepaths of clinical risk factors or assumptions regarding the timepaths of baseline risk factors. We are currently developing models to predict these timepaths, which can be easily integrated as additional submodels to reflect diabetes management practices in the populations of interest. We also need to demonstrate external validity and the applicability of the model to other populations such as those in South East Asia, which are known to have a different profile of complications [24]. Further comparisons with other diabetes simulation models and standalone equations such as the UKPDS risk engine [25] will allow us to assess whether the new outcomes model has improved predictions in different diabetic populations. Finally, the use of this new outcomes model for costeffectiveness analysis will require derivation of QALY weights for events including ulcer and second events, and estimation of costs associated with complications. These enhancements will be addressed in future work.
A common criticism of many computer simulations is that they are a ‘black box’ with users having little understanding of the underlying relationships between input values and outcomes of the model. By contrast, the UKPDSOM2 takes a completely transparent approach [6, 26], in which we have fully reported its development, the equations that determine all outcomes and the algorithm used to bring the elements of the model together.
The model will contribute to a greater understanding of the progression of diabetes and its complications and is likely to be used widely by epidemiologists, health economists and trialists. It will play a major role in comparative effectiveness, in costeffectiveness analyses and in the evaluation of strategies for the management of diabetes in the future.
Acknowledgements
We acknowledge the UKPDS Group (see ESM). We thank P. Kelly (University of Sydney), B. Mihalova and M. Alva (both University of Oxford) for helpful discussions, and R. Coleman and I. Kennedy (Diabetes Trials Unit) for providing information on the UKPDS clinical trial data.
Funding
A. J. Hayes and P. M. Clarke acknowledge the support of Australian National Health and Medical Research Council project grant no. 512463 and capacity building grant no. 571372. A. M. Gray, J. Leal and R. R. Holman acknowledge the support of a UK Medical Research Council project grant on disease modelling (grant ID: 87386). R. R. Holman and A. M. Gray are NIHR senior investigators.
Duality of interest
The authors declare that there is no duality of interest associated with this manuscript.
Contribution statement
AJH carried out statistical analyses, designed and programmed the model and drafted the manuscript; JL contributed to statistical analyses, carried out UKPDSOM1 simulations and contributed to aspects of the manuscript; AMG contributed to the design of the study and aspects of the manuscript; RRH provided all the clinical data and provided clinical input on design of the study; PMC conceived the study, gave statistical and modelling advice and drafted the manuscript. All authors critically revised earlier drafts of the manuscript. All authors approved the final version of the manuscript.