We have developed and tested an RCE in which an individual’s risk can be predicted from contemporaneous routinely collected clinical data, referenced to the clinical histories of the local population, using covariates of local relevance. The risk can be reassessed at each screening episode as new clinical information is acquired.
The Markov approach we have used allows a dynamic model of the retinopathy history to be built. In a sense, the model ‘compresses’ the information about time evolution. The Markov property can be summarised by the phrase ‘The future is predicted from the past through the present’, and is particularly appropriate to our setting.
The strengths of our model include our approach to tackling the data in routine screening. Retinopathy data in screening is interval censored [16] in that the event seems as if it has happened when it is detected. This may lead to biased estimates, as it ‘seems’ like the disease developed later than it actually did. Unlike other ‘classic’ model types, including the Cox model, the Markov approach can internally handle this interval censoring. In addition, it predicts the probabilities of transition for all disease states. ‘Real life’ data from routine clinical practice inevitably introduces missingness and recording errors. We embedded a model for multiple imputation of missing covariates, which was required to allow our RCE to run effectively.
Potential limitations of our RCE relate to model design and some of the covariates. We did not adjust for misclassification of retinopathy during grading. This could be addressed by adding a misclassification model, but at the cost of substantially more observations and computational complexity. Some covariates were not informative in the Liverpool setting. Ethnic diversity is low and the prevalence of abnormal eGFRs <60 ml min−1 1.73 m−2 was only 14.5%. Other covariates such as social deprivation score may be worth adding. ‘Type of diabetes’ may not be accurately recorded in primary care and the increased use of insulin in type 2 diabetes makes ‘insulin usage’ an unreliable criterion. We used date of first HbA1c test to improve data on ‘duration of diabetes’, helpful especially in people with long durations, but less reliable since the introduction of HbA1c as a primary screening test.
The model consistently showed good levels of prediction for the 2.5% risk threshold. The numbers of screen-positive cases with overestimated screening dates and screen-negative cases with underestimated screening dates were reduced. The majority of people were correctly allocated (78% of screen positives, 80% of screen negatives), with a reasonable allocation of (approximately) 10%:10%:80% across the 6, 12 and 24 month intervals. The number of patients who had the screen event before the allocated screening date was reduced by more than half and the overall number of screening episodes was reduced by 30%.
We included a strongly embedded local patient group, which allowed us to develop an appropriate preliminary covariate list and acceptable screen intervals and risk threshold. This group developed expertise over a series of meetings and provided substantial input into design and implementation. Strong patient and professional involvement is very valuable in study design and delivery.
Our RCE development process is suitable for a wide range of geographical locations and populations with a minimum prerequisite of a centrally maintained disease register with adequate historical data. Revision/addition of covariates can be accommodated based on the strength they add to a locally developed model. For example, higher prevalence of poor diabetes control or renal disease may strengthen the effect of HbA1c or eGFR. Alternative intervals including extension beyond 24 months could be developed subject to acceptability. Local populations may select alternative risk thresholds depending on the perception of risk. We give the key steps to developing and building such a system in the text box.
The use of near-real-time data and a model developed from local data in our approach is novel. Aspelund et al developed a risk-estimating model in Iceland [15]. They used a proportional hazards Weibull model informed by local retinopathy data between 1994 and 1997 and risks for covariates estimated from data published in the 1990s. ROC analysis showed a fair performance, with 59% fewer visits than annual screening. Van der Heijden et al tested this model in an up-to-date prospective cohort of people with type 2 diabetes [25]. Of a total of 8303, 3319 met the eligibility criteria, with a mean of 53 months follow-up. Discriminatory ability was good (C-statistic 0.83), but 67 of 76 people (88.2%) who developed STDR developed it after the time predicted by the model. This overestimation of risk highlights the weakness of using historical data.
Hippisley-Cox and Coupland recently developed equations to predict 10 year rates of amputation and blindness using similar methods to us [26]. They studied routinely collected general practice and hospital episode data from 454,575 people with diabetes. A web-based 10 year calculator using Cox’s proportional hazards models was developed. They reported comparable C-statistics (≥ 0.73) and conducted external validation using 357 practices that used a different database. The principal limitation of this large study was the lack of validation of the diagnosis of blindness.
Risk engines have been developed in other diseases including coronary heart disease, stroke and lipid therapy. The UK Prospective Diabetes Study developed a risk engine for predicting coronary heart disease [14], now in its second version (UKPDS Outcomes Model 2).
We included clinical risk factors in our model. It has recently been suggested that retinopathy data are sufficient to develop a risk stratification to extend screening intervals for people at low risk [27]. This may prove to be a reasonable and pragmatic approach. We had to overcome significant challenges in developing a near-real-time data flow; this may be too difficult in some populations. However, we determined that including clinical data would aid acceptance amongst the professional community, offer better prospects for generalisability and allow inclusion of more frequent screening for high-risk individuals. Our view is supported by our own data [28] and those of others [29], and also by our patient expert group. We do recognise that, as yet, estimates of resource requirements for the effective introduction of our type of RCE are not available.
External validation of models is required before general implementation [30]. However, validation methods for an approach such as ours are not well developed. An RCE comprises two principal components: (1) the dataset containing a set of covariates and the outcome of interest; and (2) the mathematical model applied to the data in the dataset. The application to a population is specific to that population. In addition to the interval censoring described above, screening data are also not proportional. This makes problematic the use of widely accepted statistics for assessing effectiveness of diagnostic tools based on Kaplan–Meier methods. An approach to validation was developed, taking these constraints into account, comprising dataset validation, model checking, internal validation (including data splitting, bootstrapping, C-index) and estimation of sensitivities/specificities at specified intervals, all recognised internal validation methods [30]. An implementation phase will include model updating (temporal validation and model tuning) and the opportunity for comparative cross population (external) validation to correct for potential overperformance [31].
We believe that the Liverpool RCE is feasible, reliable, safe and acceptable to patients. Implementation of our RCE into routine clinical practice would offer potentially significant transfer of resources into targeting high-risk and hard-to-reach groups and improved cost-effectiveness. Based on the internal validations we have performed, it shows sufficient performance for a local introduction. However, wider implementation will require an external validation process and testing of safety and acceptability in an RCT setting [31]. Investment in IT systems will be required for implementation in large-scale health systems, such as the NHS, and to support further validation.