Introduction

Systematic risk factor screening is an increasingly popular method of delivering population-level cardiovascular disease prevention [1]. Frequent clustering of risk factors, co-existence of potentially preventable microvascular morbidities and a particularly protracted, deleterious, asymptomatic phase render type 2 diabetes mellitus amenable to approaches directed towards earlier identification and intervention.

Screening is now recognised by many organisations as an economically viable [2] and relatively harmless [3] procedure that probably reduces cardiovascular events in people with previously undiagnosed disease [4]. Although there is some evidence that even unrestricted screening programmes are cost effective in reducing vascular complications in people with type 2 diabetes mellitus [5], recommended strategies are likely to continue targeting high-risk individuals to theoretically maximise efficiency and diagnostic yield. In addition, as lifestyle and/or pharmacological based interventions have repeatedly been shown to delay the onset of type 2 diabetes mellitus in people with impaired glucose tolerance (IGT) [6], enthusiasm to identify such cases is also growing.

Various types of risk score have been developed for the purpose of identifying those who would benefit from screening for undiagnosed diabetes and, in some cases, the glycaemic category of impaired glucose regulation (IGR). Non-invasive risk scores can use a number of approaches including being applied as a questionnaire to the individual being assessed (self assessment) or as a query to a general practice database where all those at high risk of having undiagnosed type 2 diabetes mellitus or IGR are identified using routinely collected data. Increasingly, high quality patient data are being stored electronically [7]. Optimising routinely collected data may have advantages over reliance on people to self-refer for testing and can be used to implement national screening programmes, such as the NHS Health Checks Programme in the UK [8].

Recently the WHO recommended that the 1999 diagnostic criteria be changed and that those with an HbA1c ≥6.5% (48 mmol/mol) should be included in the diagnosis of type 2 diabetes mellitus [9, 10]. The report found that currently there is insufficient evidence to classify IGR using HbA1c [10]; however, HbA1c between 6.0% (42 mmol/mol) and 6.4% (46 mmol/mol) has been suggested [11]. Although a number of relatively simple questionnaires and automated database diabetes risk scores have been developed to identify people most likely to benefit from definitive testing [1214], none of these scores have taken into account the new diagnostic criteria for type 2 diabetes mellitus in their development.

The aim of this study was to develop and validate a score that can be used in a multiethnic population based on variables that are stored within primary care databases for identifying those with undiagnosed IGR on an OGTT, or type 2 diabetes mellitus based on either the WHO 1999 or 2011 diagnostic criteria for invitation for further testing.

Methods

Ethics approval

Ethics approval was obtained from the local ethics committee for both ADDITION-Leicester and STAR and both were carried out in accordance with the Declaration of Helsinki as revised in 2000. All participants included in both STAR and ADDITION gave informed consent.

Data set

To develop the score we used data from 6,390 participants aged 40–75 years from the ADDITION-Leicester population screening study (trial registration no. NCT00318032). This study has been described in detail elsewhere [15]. In summary, ADDITION-Leicester invited a randomly selected 30,950 people without diagnosed diabetes from Leicester and the surrounding county for screening between 2004 and 2008; 6,749 individuals attended screening (response rate 22%). All screened participants received an OGTT using 75 g of glucose, and had biomedical and anthropometric measurements taken by a trained member of the research staff, which included data such as medical history, medication, BMI, blood pressure, and a self completed questionnaire. The questionnaire collected data on smoking status, alcohol consumption, occupational status, ethnicity, physical activity, the FINDRISC score [16] and a number of scales to measure domains such as wellbeing and anxiety.

In ADDITION-Leicester, participants were classified as falling into the glycaemic categories of impaired fasting glucose (IFG), IGT or type 2 diabetes mellitus, according to WHO 1999 criteria [9]. For this study IGR refers to the composite of IGT and/or IFG. HbA1c was collected for all participants at baseline.

Variables considered

The variables to be considered for inclusion in the score are limited to those that are included in the ‘typical’ general practice database with a good level of reliability and completeness. The consensus is that the following items satisfy these conditions: age, sex, BMI, ethnicity (white European or other), family history (of type 1 or type 2 diabetes mellitus), smoking status (current smoker or ex or non), prescribed antihypertensives, statins or steroids, history of cardiovascular disease (myocardial infarction, stroke, heart valve disease, atrial fibrillation, angina, angioplasty or peripheral vascular disease) and deprivation (measured using the Index of Multiple Deprivation [IMD] calculated from the individual’s postcode). This pool of variables assessed covers the majority of those included in previously developed screening tools and screening guidelines.

Modelling and internal validation

All modelling was carried out in Stata (version 11.1) using logistic regression with the composite of IGR (defined as IFG or IGT on OGTT [not including HbA1c 6.0–6.4 at this stage]) or type 2 diabetes mellitus (OGTT or HbA1c ≥6.5% [48 mmol/mol]) vs normal as the dependent variable. A non-automated approach was taken for variable selection; initially each variable was modelled to see if it independently predicted the outcome. Sets of predictors shown to be independently related were then considered. Once an additive model was established we assessed all possible two-way interactions and the addition of polynomial terms. At each step the area under the receiver operating characteristic (ROC) curve was used to compare models in addition to the p value for the covariate of interest. The importance of introducing functional polynomial terms was also assessed using Akaike’s information criterion (AIC) [17].

Creating scoring system

The score was derived by summing each of the β coefficients from the best fitting model. The discrimination of the score was assessed using the area under the ROC curve. Calibration was assessed using the Hosmer–Lemeshow statistic [18].

External validation

Data from a second screening study (the Screening Those At Risk (STAR) study) was used for external validation of the score [19]. This was a cross-sectional screening study using a 75 g OGTT. Of 3,004 participants aged 40–75 years with at least one risk factor, 33.7% of participants were from a black or minority ethnic background (mostly South Asian). The external validation was carried out using six different outcomes that reflect how the score will be used in clinical practice, i.e. one method of diagnosis will be chosen: (1) type 2 diabetes diagnosed using OGTT; (2) type 2 diabetes diagnosed using HbA1c; (3) IGR defined as IGT or IFG on OGTT; (4) HbA1c between 6.0% and 6.4%; (5) type 2 diabetes or IGR on OGTT; (6) HbA1c ≥6.0%. The ROC curve was plotted for each outcome and the area under the curve was calculated. The sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), likelihood ratio for a positive test (LR+), and likelihood ratio for a negative test (LR−) with 95% CI were calculated comparing each cut point on the score to the outcome.

Results

The characteristics of those included in ADDITION-Leicester over 40 years of age are given in Table 1. The mean age was 57.3 years with 48% being male. Three-quarters of the cohort were white European, with 23.5% of other ethnicity (of which the majority were South Asian, 91%). Twenty‐two per cent had either IGR (OGTT) or type 2 diabetes mellitus (OGTT or HbA1c ≥6.5% [48 mmol/mol]), and 7.6% had type 2 diabetes mellitus on OGTT or HbA1c ≥6.5% (48 mmol/mol). The external validation cohort (STAR data set) had similar characteristics but with a slightly higher number of people with IGR or type 2 diabetes mellitus (24%) and reporting that they were smokers (25% vs 14%).

Table 1 Characteristics of the data sets used for model building and external validation

Score development

Table 2 shows the final model produced. Age, sex (male vs female), BMI, ethnicity (‘other’ vs white European), antihypertensive therapy (yes vs no) and family history of diabetes (any type, yes vs no) were all found to be significant predictors of IGR or type 2 diabetes mellitus both when modelled separately and together. Adding other variables did not improve the area under the ROC curve. There were no statistically significant two-way interactions, assessing significance at the 1% level, because of the high number of comparisons. Polynomial terms were considered for age and BMI but this did not improve the fit of the model.

Table 2 The association between the set of risk factors included in the score and the glycaemic categories of IGR and type 2 diabetes

The area under the ROC curve for the final model was 70.1 (95% CI 68.4, 71.7). Figure 1 shows the observed vs the estimated prevalence of IGR and type 2 diabetes mellitus grouped by the predicted probability. This shows overall good agreement between the observed and predicted estimates. This is reflected in the result of the Hosmer–Lemeshow test based on ten groups (χ2 = 2.4, p = 0.97).

Fig. 1
figure 1

Comparison of the observed vs the estimated prevalence of IGR or type 2 diabetes mellitus grouped by decile of predicted probability of IGR or type 2 diabetes. Black circles, observed; white circles, expected. T2DM, type 2 diabetes mellitus

External validation

The performance of the score in differentiating between those who had IGR or type 2 diabetes mellitus diagnosed using either an OGTT or HbA1c and those who had normal glucose tolerance in the external data set is shown in Table 3 and Fig. 2a–f. The score can be used in two ways; either by setting the sensitivity to a certain level or by deciding what percentage of the general practice to invite for further testing. If using an OGTT for diagnosis then 50% of a general practice would need to be invited for testing to detect type 2 diabetes mellitus with 80% sensitivity, this is slightly raised to 54% being invited if using HbA1c. To retain 80% sensitivity for the IGR outcomes, the percentage invited would need to be increased to 60% if using an OGTT and 66% for an HbA1c between 6.0% (42 mmol/mol) and 6.4% (46 mmol/mol). Inviting the top 10% for testing, 9% of these would have type 2 diabetes mellitus using an OGTT (PPV 8.9% [95% CI 5.8%, 12.8%]) and 26% would have IGR (PPV 25.9% [95% CI 20.9%, 31.4%]). Using HbA1c increases the PPV to 19% for type 2 diabetes mellitus (PPV 18.6% [95% CI 14.2%, 23.7%]) and 28% for an HbA1c between 6.0% and 6.4% (PPV 28.3% [95% CI 23.1%, 34.0%]). If screening for both type 2 diabetes mellitus and IGR using an OGTT, inviting the top 10% for further testing gives a sensitivity of 17%. The high NPV (81.3%) suggests that this cut point is good for ruling out disease.

Table 3 Predictive performance of the score using the external (STAR) data set for identifying glycaemic categories using either OGTT or HbA1c at set levels either of sensitivity or the percentage of the population invited for further testing
Fig. 2
figure 2

ac ROC curve for type 2 diabetes (a), IGR (b), and type 2 diabetes or IGR (c), using OGTT. a Area under ROC curve = 0.7056; (b) area under ROC curve = 0.6625; (c) area under ROC curve = 0.6851. df ROC curve using HbA1c. d HbA1c ≥6.5% (48 mmol/mol), area under ROC curve = 0.6939; (e) HbA1c 6.0–6.4% (42–46 mmol/mol), area under ROC curve = 0.6220; (f) HbA1c ≥6.0%, area under ROC curve = 0.6673

Discussion

We have developed a simple and sensitive automated screening tool for use in multiethnic populations that will enable primary care practitioners to rank individuals by their risk of having undiagnosed IGR or type 2 diabetes mellitus and therefore allow targeting of screening resources. Ranking people by risk allows flexibility in the screening strategy chosen; practices can choose to hone in to the top of the list and invite fewer people for screening for a bigger ‘hit’ rate or, if resources allow, to widen their inclusion criteria, giving greater sensitivity at the offset of the specificity.

Although some existing scores have been validated against HbA1c [20, 21], this is the first to be developed incorporating the new WHO diagnostic criteria into the outcome. Previous work has shown that different cohorts are detected using either an OGTT or an HbA1c to diagnose type 2 diabetes mellitus [22]. Previously developed scores may now miss people who meet the new diagnostic criteria. This is also the first computer based score developed in a multiethnic population within the UK to identify prevalent disease. The Cambridge Risk score was designed to identify undiagnosed diabetes only and does not adjust for ethnicity [13]. Although not taken into account in the original score, a post hoc study using data from both Caribbean and South Asian populations showed that using alternative ethnic specific cut points could give acceptable levels of prediction for undiagnosed hyperglycaemia in these groups, but that further work needed to be carried out to refine these [23]. The QDScore predicts the 10 year risk of developing diabetes and includes similar variables to both the Cambridge Risk Score and the scores developed here, but with the addition of deprivation and cardiovascular disease (both of these were found not to improve the fit of models produced) [14]. Compared with the Cambridge Risk Score, the QDScore showed greater levels of discrimination, but only detects incident disease. In addition, the algorithm to compute the risk score has not been published and cannot be used to detect IGR. Other scores, including the Leicester Self Assessment score and the FINDRISC score, have been developed, which rely on the person at risk completing a questionnaire themselves and attending the GP practice [12, 16]. The score developed here may increase the uptake to screening invitation by removing the need for people to calculate their own risk.

We have previously developed a score based solely on the OGTT [1, 24]. The Let’s Prevent Diabetes trial has used this score to identify those at the greatest risk of undiagnosed IGR or type 2 diabetes mellitus within a general practice. Those ranked at high risk are invited for screening with an OGTT; those with IGR are then randomised into a diabetes prevention trial. This programme is yet to be completed but, to date, about 30% of those screened have some form of abnormal glucose tolerance [24]. The Walking Away from Diabetes programme uses the score in a similar manner; this programme found that, of those screened, 29.5% were found to have IGR [25]. These two examples demonstrate that this tool can be successfully used in clinical practice. This score has also been shown to be cost effective in a modelling study [26].

Although the score was developed using high fidelity data from a randomly selected population who all received an OGTT, there are a number of limitations to be taken into account when applying the score. First, the cross-sectional nature of the data limits the score to detecting prevalent undiagnosed disease. This score cannot, therefore, be used to estimate the risk of future disease, although detecting IGR will identify a high-risk group who are likely to develop type 2 diabetes mellitus in the future. Although this could be viewed as a limitation, screening strategies may want to focus on those who have current undetected disease as a priority. In addition, those scores predicting incident disease may give biased estimates as those variables that are included in the score are also those that prompt testing. Future work will look at validating the score on a prospective data set. Second, only 22% of those invited for screening in the ADDITION-Leicester study attended. Although this is similar to other studies in similar populations [27] and reflects the difficulty in recruiting a multiethnic urban population with wide variations in socio-economic status into research studies, this may have affected the representativeness of the data that the score has been derived from. For example, those screened were slightly older than those invited [28]. It is difficult to predict the possible implications of the response rate to the initial study on the score produced. Reassuringly, the score contains a similar set of variables to other comparable risk scores [13, 16]. Future work will further validate the score on other population based data sets. Finally, the score was developed using data from Leicester (UK). The ethnic makeup of this area means that the ethnicity component of the score is based on data from South Asian participants (mostly of Indian descent). Although there were participants included from other ethnicities in both data sets (such as Chinese, Caribbean and African) there was not sufficient data to model separate scores for each ethnicity. South Asians are known to have a high level of risk [29], and therefore assuming the same level of risk for all black and minority ethnic groups may overestimate the risk for some, but this was thought to be preferential to underestimating risk or estimating risk based on insufficient data.

In summary we have developed a valid and sensitive score for identifying those at the highest risk of prevalent IGR or type 2 diabetes mellitus within a multiethnic UK population. Using an automated tool is simple to implement and can be used to target screening approaches in a cost effective manner. For example, in the UK, this tool could be used to complement the NHS Health Checks programme as the score has been developed using data that is reflective of the inclusion criteria of the health checks.