Introduction

A recent systematic analysis showed that glycaemia and diabetes are rising global hazards, with the number of diabetic individuals having more than doubled over three decades [1]. Effective identification of individuals at high risk of developing diabetes is a major priority in managing this epidemic. Various rules have been developed to predict the incidence of diabetes in different ethnic groups [25]. Routine clinical markers available without laboratory testing have been shown to be predictive of the development of diabetes [618] and adding biochemical measures, in particular fasting plasma glucose (FPG), improves predictive accuracy [7, 8, 1217]. On the other hand, adding complex data such as the results of oral glucose tolerance tests and measurements of insulin levels and inflammatory markers into a simple clinical model only minimally improves risk prediction while increasing cost and inconvenience [10, 19]. Similarly, adding genetic information to conventional risk factors does not appear to greatly refine the prediction of diabetes risk [14, 20, 21].

Introducing HbA1c into a prediction model has been suggested to be effective in screening for future diabetes [1416, 2228], and a few models to predict the development of diabetes have recently been developed that concurrently include measurements of FPG, HbA1c and other blood markers [15, 25, 29]. After revising the diagnostic criteria for type 2 diabetes by introducing HbA1c in 2010 [30], the ADA published guidelines and recommendations for diagnosing diabetes based on published data or derived from expert consensus [31]. These guidelines emphasised again that ‘in addition to the long-standing criteria based on measurement of plasma glucose, diabetes can be diagnosed by demonstrating increased levels of HbA1c concentrations’ [31]. Since the diagnostic criterion of HbA1c ≥6.5% (48 mmol/mol) has been newly introduced not only by the ADA but also by the World Health Organization [32], the utility of HbA1c testing as a screening tool to predict future diabetes, especially estimating risk among individuals below the diagnostic threshold of 6.5% (48 mmol/mol), should be considered. In our previous study, we observed that elevations in the glycaemic markers FPG and HbA1c were strongly predictive of the development of diabetes [33]. Therefore, it would be reasonable to consider whether their addition to other possible predictors would allow us to devise a simple, sensitive and specific algorithm to predict diabetes.

The new diagnostic criteria for diabetes based on HbA1c do not require a fasting sample [30, 31]. This benefit deserves consideration in the development of a new scoring system by quantifying the additional predictive power of adding HbA1c to a risk score based on non-laboratory assessments (NLA). A recent study of older British men and women described a two-stage scoring system that combined a simple clinical assessment as the first step, with subsequent inclusion of HbA1c and other biochemical measures that could be assessed in a non-fasting state as the second step [16]. However, it has not been clarified to what extent available information on elevated levels of either HbA1c, FPG or both would improve overall predictive accuracy and risk reclassification after initial screening by a simple NLA score to establish a two-stage model. Whether HbA1c and FPG are equivalent tools for prediction in risk scores and whether their combination can provide a distinct advantage are important and practical questions at the present time, particularly given the energy expended and emphasis placed on establishing programmes for primary prevention. Therefore, in this study we aimed to assess the clinical significance of introducing HbA1c into a risk score and to develop and validate a scoring system to predict the 5 year incidence of diabetes in Japanese men and women. We evaluated the performance of the various diabetes risk scores that we developed as two-stage scoring systems to predict the 5 year incidence of type 2 diabetes.

Methods

Study population

The Toranomon Hospital Health Management Center Study (TOPICS) included a cohort consisting mainly of apparently healthy Japanese government employees who underwent annual examinations for health screening, in addition to some participants from the general public. All persons were interviewed at each examination using standard questionnaires that gathered information on demographic characteristics, health-related habits and medical history. A total of 29,584 individuals underwent a baseline health examination during the period from 1997 to 2002. To avoid overlapping of data from the same individuals who had health examinations in multiple years, we used data only from the first visit. Enrolled in this study were 9,344 individuals aged 24–82 years who underwent a re-examination 5 years after the initial examination. Individuals who had diabetes at the baseline examination (n = 397) or with missing data on baseline characteristics (n = 137) were then excluded. For reasons of clinical significance, we curtailed the current analysis to individuals aged 40–75 years, an age group that is the most likely to be screened for risk of diabetes. Consequently 1,148 individuals (12% of the 9,344 individuals) under the age of 40 years and 28 over the age of 75 years were excluded. Twenty individuals who had both missing data and baseline diabetes or were excluded because of age were therefore counted twice; thus, based on the above-described study design, overall 18% of individuals (n = 1,690) were excluded from the 9,344 individuals originally considered as study participants.

After these adjustments, 7,654 individuals (2,211 women and 5,443 men) aged 40–75 years comprised the cohort for derivation of the risk score to predict the 5 year incidence of diabetes. Diagnosis of type 2 diabetes was made according the ADA criteria of an FPG level of 7.0 mmol/l or higher, self-reported clinician-diagnosed diabetes or HbA1c of 6.5% (48 mmol/mol) or higher [30]. Informed consent was obtained from all participants. The study protocol followed the Japanese government’s Ethical Guidelines Regarding Epidemiological Studies in accordance with the Declaration of Helsinki and was reviewed by the institutional review board at Toranomon Hospital.

Assessment of risk factors

Height and weight were measured without shoes or heavy clothing, and BMI (kg/m2) was calculated. Blood pressure was measured by trained hospital staff with the participant in a seated position. Individuals with a systolic blood pressure of 140 mmHg or higher or a diastolic blood pressure of 90 mmHg or higher, or who were under medical treatment, were considered to have hypertension. Current smoking habit, first-degree relatives (i.e., parent or sibling) with diabetes and self-reported histories of dyslipidaemia and cardiovascular disease were assessed using a standard questionnaire. Blood samples were collected after an overnight fast (12 h) and measured using an automatic clinical chemistry analyser (LABOSPECT 008, Hitachi, Tokyo, Japan). Blood glucose was measured by enzymatic methods and HbA1c was assessed by high-performance liquid chromatography. The value for HbA1c was estimated as the National Glycohemoglobin Standardization Program value (%), calculated by the formula HbA1c (%) = 1.02 × HbA1c (Japan Diabetes Society) (%) + 0.25% [34].

Validation of the developed risk scores

The validity of the derived risk score was tested in a separate study population that underwent a first examination in 2003, 2004 or 2005, with prospective follow-up for 5 years (n = 1,976). Among the 1,976 individuals, we excluded those with diabetes (n = 106), younger than 40 years (n = 424) or older than 75 years (n = 2), or with missing data (n = 15); thus, 1,437 individuals aged 40–75 years were included in the validation study. During the 5 year follow-up, we documented 57 incident cases of diabetes for an incidence of 4.0%.

Statistical analysis

A logistic regression model was used to investigate independent predictors of the incidence of diabetes, and β coefficients, ORs and their 95% CIs were estimated. We initially tested all the variables in the univariate regression model to determine which were significantly predictive of the development of diabetes. In model building, we analysed data using a multiple regression model with both backward and forward elimination methods from the initial model until we reached a final model to select independent and significant predictors without including FPG and HbA1c values. After the simplicity of the screening algorithms was also considered, we developed the simplest NLA model that was considered to have reasonable predictive abilities by performing logistic regression analysis with the forward selection method and a significance level of 0.05. The principal criteria for selecting variables in the developed NLA model included results of the Hosmer–Lemeshow goodness-of-fit test and changes in the area under the receiver operating characteristic (ROC) curve. In addition, whether information in the NLA model was widely available and would be known to participants without consulting a medical professional was also considered in the model building.

To evaluate whether predictive ability was improved by adding FPG, HbA1c or both into the NLA model, we compared discriminative ability by calculating the area under the ROC curve, and statistical significance was tested using the method of DeLong and colleagues [35]. We also calculated net reclassification improvement (NRI) using three risk categories (<5%, 5–15% and >15%) and integrated discrimination improvement (IDI) [36]. To develop the risk scores for predicting the 5 year incidence of diabetes, we estimated point scores from the β coefficients of the multivariate logistic regression analysis. Interactions between sex and key variables in the score (obesity, current smoking habit, impaired fasting glucose or elevated HbA1c values) were also tested but significant interactions (p < 0.05) were not identified. Analysis was performed with IBM SPSS Statistics version 19 (IBM, Armonk, NY, USA) and STATA software version 11 (STATA Corp., College Station, TX, USA). Statistical significance was considered for p < 0.05.

Results

Derivation of diabetes risk score

During a 5 year follow-up period, we documented 289 incident cases of diabetes. Non-laboratory measurements of age, male sex, family history of diabetes, current smoking habit, BMI, resting heart rate, hypertension and self-reported history of dyslipidaemia were significantly associated with the development of diabetes in the univariate logistic regression model (Table 1). In the multiple regression model with the forward elimination method, BMI was most strongly predictive of future diabetes (Hosmer–Lemeshow test, p = 0.428; area under the ROC 0.653 [95% CI 0.620, 0.687]). Results of calibration by the Hosmer–Lemeshow test and discrimination of the model were p = 0.304 and area under the ROC curve 0.722 (95% CI 0.694, 0.750), respectively, after entering the four variables of family history of diabetes, sex, age and current smoking habit (NLA model in Table 2). In the final step, further adding data on resting heart rate and hypertension minimally improved calibration (Hosmer–Lemeshow test, p = 0.19), discrimination (0.732; 95% CI 0.705, 0.759), NRI (2.4%; 95% CI –1.3, 6.1) and IDI (0.2%; 95% CI 0.05, 0.4). Considering these results and simplicity for use in routine care settings, an NLA model was constructed with the five non-laboratory markers that could be assessed before clinical measurements were performed.

Table 1 Baseline characteristics of the participants and ORs for each variable to predict the 5 year incidence of diabetes
Table 2 Prediction models for 5 year incidence of type 2 diabetes using NLA, FPG and HbA1c

Predicting the development of diabetes by adding either FPG (model 2) or HbA1c (model 3) into the NLA model significantly (p < 0.0001) improved the area under the ROC curve to a similar degree (Table 2). No significant difference was observed in the discriminative ability between model 2 and model 3 (p = 0.435). When the addition of data on HbA1c, HDL-cholesterol and alanine aminotransferase that could be assessed in a non-fasting state into the NLA model (detailed data not shown) was analysed, the area under the ROC curve (0.864; 95% CI 0.842, 0.887), NRI (47.2%; 95% CI 38.0, 56.3%) and IDI (14.7%; 95% CI 12.4, 17.0%) indicated refined risk prediction, but similar to that of NLA + HbA1c (model 3) and NLA + FPG (model 2). When we added both FPG and HbA1c into the NLA model (model 4), the area under the ROC curve was improved (p < 0.0001) to 0.907 (95% CI 0.890, 0.925), which was the highest of the four models (p < 0.0001). NRI and IDI were also improved by adding information on FPG and HbA1c either singly or in combination into model 1 (p < 0.0001).

Then we developed four diabetes risk scores using the significant predictors in each model by categorising the variables for practical use in clinical settings (Table 3). Calibration by the Hosmer–Lemeshow test was p ≥ 0.05 for all of the models and showed a reasonable fit (see electronic supplementary material [ESM] Fig. 1). The estimated probability of developing diabetes 5 years later gradually escalated in association with higher risk scores (ESM Table 1). The area under the ROC curve of the NLA score was 0.708 (95% CI 0.679, 0.737); this risk score with omission of glycaemic measurements discriminated relatively well between individuals who did and did not develop diabetes (ESM Fig. 2). The two scores that included either elevated FPG levels or elevated HbA1c levels had greatly improved discrimination. The area under the ROC curve for the NLA + FPG + HbA1c score generated the highest discrimination of 0.887 (95% CI 0.871, 0.903) among the four risk scores (p < 0.0001). When we investigated screening performance according to cut-off values (Table 4), the NLA score of 6 points or higher had a sensitivity of 89.6% but specificity of only 32.3%. The ‘NLA + FPG score’ with 7 points or higher was associated with a sensitivity of 89.6% and high specificity at 63.9% and the ‘NLA + HbA1c score’ with 9 points or higher had a sensitivity of 86.9% and specificity of 63.1%. We found that the ‘NLA + FPG + HbA1c score’ with 11 points or higher generated the highest combination of sensitivity (83.7%) and specificity (79.0%) and had the best positive predictive value (13.5%) and positive likelihood ratio (3.99) among the risk scores. We also confirmed that the screening performance of the four diabetes risk scores was well validated in a population (n = 1,437) separate from the derivation cohort.

Table 3 Development of diabetes risk score to predict 5 year incidence of type 2 diabetes using NLA, FPG and HbA1c
Table 4 Screening performance of the developed diabetes risk scores for predicting future type 2 diabetes

Table 5 shows the improved discriminative ability and risk reclassification provided by the addition of FPG and/or HbA1c after screening by the NLA risk score. Results in which we calculated the NRI showed that additional information on elevated levels of either FPG or HbA1c appropriately reclassified participants into predicted 5 year risk categories. When both FPG and HbA1c were available after the initial screening by the NLA risk score, 56% of individuals who developed diabetes were appropriately reclassified and the NRI was 56.7% (95% CI 47.3%, 66.1%). The IDI was also the highest, that is, 10.9% (95% CI 9.7%, 12.1%), when FPG and HbA1c were introduced into the NLA risk score at the second screening test.

Table 5 Reclassification of 5 year predicted risk and change in risk discrimination for future type 2 diabetes after addition of glycaemic markers to a risk score that includes NLA

Discussion

Using information on family history of diabetes, obesity, current smoking habit and elevated levels of FPG and HbA1c within a non-diabetic range, we developed a simple, highly sensitive and specific scoring system to predict the 5 year risk of developing diabetes. Introducing HbA1c into a risk score that included FPG and non-laboratory measurements further refined the risk prediction and contributed to creating a valid and simple scoring system. In the past, various assessment tools have been developed for diverse ethnic groups [25] and the screening accuracy of some of these instruments has also been validated in external populations [611, 13, 18, 19, 3739]. However, none of those validated scoring systems concurrently used HbA1c and FPG with other routinely available clinical markers to derive a feasible scoring system to predict risk of diabetes. Our study provides four diabetes risk algorithms that can be used both within and outside of clinical practice and in non-fasting and fasting states, and found that in this two-stage scoring system, after initial screening by the developed NLA risk score, subsequent available information on FPG, HbA1c or both precisely refined diabetes risk reclassification in Japanese men and women.

Combining information on FPG or impaired fasting glucose with a simple diabetes risk score has been reported to increase predictive ability [7, 8, 1217]. A study reported that screening models using the combination of HbA1c, BMI and FPG accurately identified individuals at risk of future clinically diagnosed diabetes [22], although the factors that remained significant were different from those found in the present study. The EPIC-Potsdam Study reported that discrimination by the German Diabetes Risk Score (including anthropometry and lifestyle characteristics) [9] had an area under the ROC curve of 0.8465, which was improved by adding FPG (0.8672), HbA1c (0.8859) or both (0.8926) [14]. Our simple risk scoring systems with non-laboratory measures that include elevated FPG or HbA1c levels have similarly excellent discrimination with no significant difference. Results of the EPIC-Potsdam Study also suggested that risk reclassification was improved by adding FPG (IDI 0.0553) or HbA1c (IDI 0.0974) values using the prediction model by the German Diabetes Risk Score as a reference [14]. Recent prospective studies have examined the utility of introducing HbA1c testing for predicting diabetes [15, 16, 2327], and some reports have described the development of models to predict future diabetes using blood variables including FPG and HbA1c [15, 25, 29]. More recently and concomitantly with the preparation of this manuscript, a risk score that concurrently included measurements of FPG, HbA1c and other biochemical markers was reported in a Korean population [40]. However, whether HbA1c and FPG are equivalent tools for prediction in risk scores, and whether their combination provides a distinct advantage, are important and practical questions at the present time, particularly given the energy expended and emphasis placed on establishing programmes for primary prevention. Our study suggests that adding elevated HbA1c levels within a non-diabetic range into a diabetes risk score that includes non-laboratory measurements improves its screening performance, and that HbA1c and FPG equally contribute in a major way to our risk scoring system.

Wannamethee et al developed a risk score using simple blood markers and HbA1c values that were not dependent on a fasting state and could therefore be used at any time of day [16]. They also suggested that an approach using simple clinical assessments in the first instance followed by the use of routine blood markers that do not require a fasting sample might be cost-effective in identifying those most likely to be at high risk for diabetes [16]. However, it has not been clarified whether adding both FPG and HbA1c into risk scoring systems will generate a distinct advantage for improving overall screening performance. We found that after initial screening using a screening score based on non-laboratory measurements, additional information on either elevated FPG levels or elevated HbA1c levels within a non-diabetic range improved risk prediction to a similar extent in our study population. Although which score will be most useful for predicting diabetes risk will depend on the individual’s situation and the particular screening setting, our study results suggest that if both glycaemic data are available to add to the NLA risk score, clinicians might more precisely estimate the risk of diabetes development and take measures with their patients to prevent onset of the disease.

Although whether the developed tools might be useful beyond the Japanese population must be evaluated in other ethnic groups, our scores included common risk factors for diabetes, such as age, family history of diabetes, sex and obesity, compared with components of previously developed risk scores in other ethnicities [24]. A current smoking habit as a modifiable factor was also included in our developed scores. A meta-analysis has shown that being an active smoker is associated with a 1.44-fold higher risk of developing diabetes compared with being a non-smoker [41]. According to a recent study that reported the DETECT-2 update of the Finnish diabetes risk questionnaire, adding information on smoking and family history of diabetes into the original Finnish risk questionnaire improved its predictive ability [42]. Our results indicate that control of these two modifiable risk factors, that is, obesity and current smoking, should be given priority in preventing diabetes regardless of glycaemic status. Nonetheless, as to the magnitude of the effect of each risk factor on the incidence of diabetes, on average the BMI in Asian individuals is lower than that in other populations [43], and Asian individuals tend to develop diabetes at lower BMI levels than white individuals [44]. It also should be considered that results of performance of HbA1c as a screening test might differ according to race [45]; therefore, whether the thresholds shown in this study are universally useful should be validated in other ethnic groups.

Since the utility of HbA1c as a screening tool for individuals at risk of diabetes has been recognised [46, 47], it is reasonable that the use of two glycaemic indicators would be considered in developing an effective screening score for risk of diabetes. Our findings suggest that a scoring system should not be limited to using data from key conventional variables, as reported in the past, but should also include HbA1c in the development of a diabetes risk score to further improve screening for future diabetes. However, measuring both increases the cost of screening, and the cost-effectiveness of the use of HbA1c in a two-stage test or as a simultaneous test with FPG in screening to predict diabetes should be considered. Nonetheless, studies confirm that conducting screening is more cost-effective than not performing screening, both from the health system and societal perspectives [48, 49]. An analysis of cost-effectiveness showed that in a comparison with no screening with simulated screening strategies, screening would theoretically reduce the incidence of microvascular complications and myocardial infarction and increase the number of quality-adjusted life-years over a follow-up period of 50 years [49]. A recent validation study of previously developed diabetes risk scores indicated that a risk score by Kahn et al that included laboratory measurements [13] had a high discriminatory value [18]. It was also reported that the Finnish diabetes risk score without blood testing [6] might be a more practical and less expensive screening test [18], although additional costs resulting from the necessary follow-up of individuals with a false-positive test result would be a consideration [50]. Performance of our risk scores with a true two-stage step-wise strategy would need to be evaluated in future research.

One strength of our study is the availability of data on a large number of individuals over a lengthy period. An additional strength is that this study addresses an important topic that will contribute to more accurate risk classification and allow strategies to be implemented to prevent the development of diabetes. Several limitations must be considered. Since our cohort consisted of Japanese individuals who had annual health check-ups, the generalisability of our results should be investigated in general populations. Our cohort included mainly men (71% of total participants) and the database did not allow us to conduct sex-stratified analyses to develop separate risk scores for men and women. In addition, from this prospective study we cannot suggest that our developed NLA score might be used as a self-assessment tool to identify individuals with presently unknown diabetes.

In conclusion, our study demonstrates that by adding information on HbA1c into a risk scoring system that included FPG, family history of diabetes, current smoking habit and obesity, we could develop a simple, sensitive and specific algorithm that was clinically relevant to predict the 5 year risk of diabetes. Our risk score may contribute to predicting the future development of diabetes and thereby identifying individuals who will likely benefit from early interventions.