Considerable evidence has been presented on the rise in diabetes prevalence in the United States and the United Kingdom [1, 2]. The prevalence of diabetes has become so large that it has been termed an epidemic [1, 3]. This rise is particularly important for healthcare needs in the US because almost 30% of individuals with diabetes are currently undiagnosed and diabetes is disproportionately represented in minority populations [4].

Projection of future disease prevalence helps to plan for healthcare needs. An understanding of the population at risk of developing the disease is critical when projecting future disease burden. Several studies have projected future diagnosed diabetes prevalence for the US and other countries [2, 59]. These studies, however, did not consider that not all individuals are equally at risk of developing diabetes, thereby possibly distorting estimates of downstream prevalence. For example, some risk factors for diabetes, e.g. obesity, have increased substantially in the population [1012]. Moreover, many of these projections are based on estimates of diagnosed diabetes and exclude estimates of total diabetes (diagnosed and undiagnosed), which may lead to serious underestimation of the diabetes burden in the population.

Major risk factors for diabetes have been identified and are currently used by the American Diabetes Association to guide screening strategies. Although there are various measures for assessing the risk of having undiagnosed diabetes [1317], few measures are available for assessing the risk of developing diabetes [18, 19]. Moreover, accounting for changes in the proportion of high-risk individuals, particularly as assessed through clinical indicators, has not been incorporated into previous projections of future diabetes burden.

The purpose of this study was to project the prevalence of diabetes for the adult US population up to 2031, using models based on data contained in the nationally representative National Health and Nutrition Examination Survey (NHANES) II mortality survey (1976–1992), NHANES III (1988–1994) and NHANES 1999–2002.

Materials and methods

Diabetes prevalence model

The model for diabetes prevalence used in this study was created using data from the NHANES III (1988–1994), and then fitted to data from the NHANES 1999–2002 as a validity check of the accuracy of the model’s projections. The resulting model was then used to project the number of individuals with diabetes in the US in 10-year increments into the future. We evaluated 10-year age classes at each 10-year interval. Our model has the following components:

  1. 1.

    Number of individuals with diabetesTime 2 = ∑ (number of individuals with diabetesTime 1i  + incident cases i  − mortality i ), where i equals each 10-year age group, and incident cases consist of: (1) persons converting from a disease-free state to having diabetes; (2) diabetic patients immigrating to the United States; and (3) persons with diabetes moving into the 20 to 29-year-old age class.

  2. 2.

    The percentage of persons with diabetes, which is calculated thus:

    percentage diabetesTime 2 = (number of individuals with diabetesTime 2 / total populationTime 2) × 100.

    The estimate of future diabetes is therefore based on this equation, including the number of individuals with diabetes in the previous time period, conversion to diabetes, migration, and mortality, rather than being a linear extrapolation of the change in diabetes prevalence from the known values of 1991 and 2001.

Data sets

The NHANES is a programme of surveys conducted by the National Center for Health Statistics and designed to assess the health and nutritional status of adults and children in the United States. The survey is unique on a national level in that it combines interviews and physical examinations. The NHANES uses a complex multistage sampling design, making it representative of the non-institutionalised US population and allowing weighted estimates to be computed.

For this study we used several of the NHANES data sets. Specifically, we used the NHANES III (1988–1994) (unweighted n = 4,950) and the NHANES 1999–2002 (unweighted n = 3,804) to estimate among individuals of 20 years of age and older the prevalence of diagnosed diabetes, the total diabetes burden (diagnosed and undiagnosed diabetes), and the proportion of the population at risk of developing diabetes. Since mortality within the population affects future prevalence [20], we also used the cohort from the NHANES II mortality survey (1976–1992) (unweighted n = 3,916) to provide estimates of diabetes mortality. Computation of all analyses using the NHANES data sets to provide nationally representative estimates for the models was designed to account for the complex survey design and the appropriate sample weights. All analyses were conducted using SUDAAN software (Research Triangle Institute, Research Triangle Park, NC, USA).

Variables used in models

Prevalence of diabetes

Diabetes burden was assessed as diagnosed diabetes plus undiagnosed diabetes. Because of the substantial proportion of people with undetected diabetes, we focused on this formula for total diabetes, rather than using diagnosed diabetes to indicate diabetes burden in the population. Moreover, by focusing on diabetes as diagnosed and undiagnosed disease, we minimised the possible impact on future diabetes prevalence of changes in screening practices for diagnosing diabetes during an ensuing time period.

Diagnosed diabetes was assessed as individuals who answered yes to a question of whether a doctor had told them they had diabetes. Undiagnosed diabetes was estimated on the basis of individuals who said they had not had a previous diagnosis of diabetes, but who had fasting plasma glucose (FPG)  > 7.0 mmol/l. Although, the diagnostic criteria for diabetes during the time between the NHANES II and the NHANES III changed from FPG  > 7.8 mmol/l to FPG  > 7.0 mmol/l, we used the newer criteria to gain an awareness of the total diabetes burden at each point in time using the same criteria [21].

Persons converting from a disease-free state to having diabetes

Although a variety of diabetes risk scores exist, most have been created from cross-sectional studies and have as their aim the identification of individuals with undiagnosed diabetes. Their ability, therefore, to make predictions on development of diabetes is unknown [13, 14, 16]. The risk score used in this study is based on one developed for the Atherosclerosis Risk in Communities (ARIC) cohort study [18]. Among individuals without diagnosed diabetes or FPG  > 7.0 mmol/l, we used a scoring strategy which includes: high waist circumference (>102 cm in men, >88 cm for women), raised blood pressure (>130/85 mmHg or antihypertensive medications), low HDL-cholesterol (<1.03 mmol/l for men, <1.29 mmol/l for women), high triacylglycerol (>1.7 mmol/l), BMI > 30 kg/m2, and hyperglycaemia. Each of the characteristics is worth 1 point except for hyperglycaemia, which can be worth 2 points if FPG is > 5.6 mmol/l or 5 points when FPG is >6.1 mmol/l. A score of >4 puts an individual at high risk of developing diabetes, whether diagnosed or undiagnosed. A score of <4 indicates that a person has a low risk of developing diabetes.

This particular risk score was chosen for several reasons. First, it has moderate sensitivity (68%) and specificity (75%). Second, it is computed in a reasonably straightforward manner without having to use coefficients from the ARIC cohort that may be specific to that cohort. Third, data and results provided in the study by Schmidt et al. [18] allowed for computation of the rate of development of diabetes in both the high-risk group and the low-risk group. The ratio of development of diabetes in the high-risk group versus the low-risk group was 4.5:1. Variables needed to compute this diabetes risk score are available only in the NHANES III and the NHANES 1999–2002.

Although the ARIC diabetes risk score did not specifically consider race or age in the computation [18], we computed conversion rates for 10-year age classes for three race/ethnic groups (non-Hispanic Whites, non-Hispanic Blacks and Hispanic individuals) by fitting age categories for the data from 1991 to 2001 and then fitting race/ethnicity on to the same time change. We did not compute specific sex-specific conversion rates because sex was already differentiated in several of the variables in the ARIC diabetes risk score [18].

Migration of persons with diabetes

Migration of individuals with or without diabetes into the population can also affect future diabetes prevalence. Recent projections have included migration within their models [5]. Because we are looking at changes in diabetes prevalence among adults, migration of adults, particularly from ethnic minorities, could substantially affect the 10-year projections. We used data from the NHANES III to estimate migration of persons with diabetes in the 20 years and older age groups. The NHANES III measured how many years foreign-born immigrants had been in the US. Thus, we estimated the number of foreign-born individuals who had been in the country for 9 years or less for the total population as well as for different racial/ethnic groups. The NHANES III data allowed us to make estimates for non-Hispanic Whites, non-Hispanic Blacks and Hispanic individuals.

Persons with diabetes moving into the 20 to 29-year-old age class

For 2011, 2021 and 2031 the total number of persons with diabetes in the 20 to 29-year-old age class was estimated using a linear projection of the NHANES III and NHANES 1999–2002 data. The proportion of 20 to 29-year-olds with diabetes in each race/ethnic group was held constant at the proportions found in the NHANES 1999–2002 data at the later time intervals.

Mortality among individuals with diabetes

Diabetes mortality for the total population was based on data from the NHANES II mortality survey (1976–1992). This population-based cohort study was used to provide estimates of diabetes mortality, since mortality within the population affects future prevalence [20]. Diabetes mortality was estimated as all-cause mortality among individuals with diabetes (either diagnosed or undiagnosed) at baseline, rather than as mortality with diabetes listed as the cause of death. This definition is more consistent with the potential impact of diabetes on future prevalence. Mortality estimates were computed separately for the total population by age classes.

The NHANES II mortality cohort is based on a sample of individuals aged 30 to 75, whereas we made diabetes estimates on individuals aged 20 years and older. Consequently, we assumed no deaths due to diabetes in the 20 to 29-year-old age group over the 10-year period.

Population estimates

Total population of 10-year age classes was estimated using data from NHANES III for 1991, NHANES 1999–2002 for 2001, and US Census Bureau, Middle Series projections for 2011, 2021 and 2031 [22]. Total population of race/ethnic groups was also determined by 10-year age classes using the same sources of information.


In an effort to provide an estimate of future trends in diabetes and the population at high risk of developing diabetes, we employed the following procedure. We used the NHANES III data to fit a model to predict total diabetes in the NHANES 1999–2002. We used this strategy prior to making future projections, because it allowed us to develop and fit the model to an existing national estimate of diabetes prevalence. Because both the NHANES III and the NHANES 1999–2002 are based on multi-year data collection, we estimated a mid-point of 1991 and 2001 for the two surveys.

The number of persons with diabetes 10 years post-baseline was calculated for 10-year age classes by first adding baseline prevalence and incidence (the number of low-risk and number of high-risk persons who developed diabetes over the 10-year interval), then adding persons with diabetes who immigrated to the United States, and persons with diabetes who moved into the 20 to 29-year-old age class, and finally subtracting the number of diabetic subjects who died. Percentage of persons with diabetes was estimated for each time period by taking the total number of persons with diabetes and dividing by the expected total population, then multiplying by 100.

Varying model assumptions

Our initial predictions of future diabetes burden were based on the assumption of a constant proportion of individuals at high risk of diabetes at the levels present in the NHANES 1999–2002. To account for potential changes in the proportion of persons at high risk of diabetes, we also evaluated increases in the proportion of persons at high risk by 10, 20 and 30%, as well as estimates based on decreases in the proportion of persons at high risk by 10, 20 and 30%. Theoretically, it is unlikely that the proportion of persons at high risk will remain stable, because from NHANES III to NHANES 1999–2002 the proportion at high risk was seen to increase. Also, a major risk factor for diabetes, obesity, has increased substantially over a 40-year time period [10, 12]. We evaluated the effect of decreasing proportions at high risk, to account for the possibility that interventions to improve lifestyle of adults in the US may be effective.

In addition, to address the potential impact on mortality of healthcare interventions in management of diabetes, we examined potential reductions of 10, 20 and 30% in mortality among individuals with diabetes. Finally, we computed a model examining a combination of effects, assuming that lifestyle interventions would yield a 10% decrease of persons at high risk and healthcare interventions would yield a 10% decrease in mortality of persons with diabetes.


Table 1 shows estimates of the total diabetes burden from the NHANES III and the NHANES 1999–2002 and the future 10-year projections for 2011 through to 2031. The number of individuals at high risk of diabetes based on the multivariable diabetes risk score was 38.4 million in 1991 and 49.9 million in 2001. Using our model to predict the known diabetes prevalence in 2001 from the 1991 data, results were satisfactory and within 0.2% of the actual population prevalence of total diabetes. If the proportion of individuals at high risk within the adult population remains stable at 2001 levels, we could expect 55.8 million in 2011, 60.9 million in 2021, and 66.1 million in 2031. As can be seen, the prevalence of diabetes is projected to increase. The diabetes prevalence of 6.3% in 1991 and 8.8% in 2001 is projected to increase to 14.5% in 2031 with 37.7 million adults having diagnosed or undiagnosed diabetes. Assuming stability in the population proportion of individuals at high risk of developing diabetes, the rate of increase in the number of individuals with diabetes and the proportion with diabetes tends to slow over time. Among individuals aged 30 to 39 years who are not currently targeted for screening according to age, the prevalence of diabetes is expected to rise from 3.7% in 2001 to 5.2% in 2031.

Table 1 Number of people (in millions of persons) with and prevalence of diabetes by year and age category

The results shown in Electronic supplementary material (ESM) Table 1 show the projected prevalence of diabetes according to different racial/ethnic groups. Non-Hispanic White adults are projected to continue to have a lower prevalence of diabetes than both non-Hispanic Black and Hispanic individuals. By 2031, the Hispanic community will have an overwhelming diabetes burden, with more than 20 percent of the adult population having diabetes.

The projections in ESM Table 2 are based on different assumptions regarding changes in the number of individuals who are at high risk of developing diabetes and changes in mortality among individuals with diabetes. As might be expected, as mortality decreases the prevalence of diabetes increases in the subsequent 10 years. The estimate for 2031 indicates that potential decreases in mortality and a potential decrease in individuals at high risk of developing diabetes yields a prevalence similar to that achieved if the proportion at high risk is kept stable from 2001. All of these estimates indicate a larger diabetes burden among Hispanics.


This national projection of diabetes prevalence for the US is the first to model the projection on the number of individuals at high risk of developing diabetes using a multivariable risk assessment. Projections suggest a rising and substantial diabetes burden for the population. Hispanic adults will be most affected, with estimates suggesting that by 2031 more than 20% of the adult Hispanic community will have diabetes. These results are particularly worrisome for this community in light of recent evidence that the gap in healthcare quality between Hispanic and non-Hispanic White individuals has continued to widen [23].

Many previous diabetes projections have been limited to estimates of diagnosed diabetes and thus have lower estimates of projected diabetes burden, and have not incorporated an evaluation of the population at high risk of diabetes, with clinical indicators, into their models [5, 9]. Our estimates will be less likely to be affected by changes in screening strategies. Additionally, they incorporate potential changes in the level of risk for diabetes in the US population, a change which is likely given national trends in obesity [3]. Moreover, recent data have suggested that individuals with undiagnosed diabetes are similar to those with diagnosed diabetes with regard to the development of complications; thus our estimates are more robust in describing the burden of disease in the population [24].

Comparing our projections with those from other studies, we note that an estimate, published in 2006, for diagnosed diabetes in the US among individuals aged 20 to 64 years in 2030 is 16.8 million [25]. Our estimates are based both on diagnosed and undiagnosed diabetes, and our projection of total diabetes among that age group for 2031 is higher, namely 19 million. It is possible that estimates based solely on diagnosed diabetes could become more consistent with our estimates, if greater vigilance were shown for screening for undiagnosed diabetes. However, not accounting for the at-risk population in the estimates is likely to lead to inaccurate estimates. A comparison of our estimates of total diabetes with those of another study [7], which projected total diabetes but did not account for the population at high risk of developing diabetes, reveals that the latter’s projections are most probably underestimates. Using data from 1993, the investigators projected a population prevalence estimate of total diabetes in the US among individuals aged 20 years and older for the year 2000 to be 7.6%, while the NHANES 1999–2002 yielded a prevalence of 8.8%. For 2025 the same team [7] projected a prevalence of 8.9% versus 13.5% for 2021 in our study.

The results have several implications for the delivery of healthcare and healthcare financing.

First, we estimated our models under several assumptions for the number of individuals at high risk of diabetes in the population. Regardless of these assumptions, the US will have a substantial number of individuals at high risk of diabetes in 2011, 2021 and 2031. Interventions to modify lifestyle are critical to decrease the number of individuals at high risk, and consequently to lower the expected increase in diabetes in the future. Although some of the diabetes estimates suggest seemingly small decreases in future prevalence, based on decreases in the population at risk, the actual numbers are substantial. For example, a one-percentage point drop in the US population estimate of diabetes among individuals aged 20 and older in 2031 is quite substantial and would account for a decrease in prevalence of diabetes equivalent to 2,600,000 people.

Second, the projection that a substantial proportion of the population will have diabetes indicates greater spending will be necessary to manage the disease. This will include spending on drugs, ongoing monitoring, and treating of complications including nephropathy, retinopathy, and cardiovascular disease.

Third, the disproportionate impact of diabetes on minorities, particularly Hispanics, demands new intervention strategies to decrease the number of individuals at high risk and to deliver care to individuals who have historically had poor access to care. Additionally, with the projected increase in diabetes prevalence among 30 to 39-year-olds, a population not currently targeted for screening, a re-examination of current public health policy and screening strategies may be warranted [26].

There are several strengths to the design of this study. One is that the study utilised multiple NHANES data sets, which have the advantage of allowing for nationally representative population estimates. Thus, the initial data used to fit the model as well as to make mortality estimates of diabetes, both diagnosed and undiagnosed, are nationally representative. Another strength is that this study is the first to make a nationally representative assessment of the at-risk population for development of diabetes and then use that assessment to model the future prevalence of diabetes. The assessment of risk used, moreover, is based on the ARIC diabetes risk score [18], a multivariable risk score that used clinical indicators.

When interpreting our results, however, several limitations need to be considered. Thus, although this is the first study to use a validated diabetes risk score to assess the high-risk population for the development of diabetes for the entire US population, potential limitations exist with regard to the diabetes risk score. The ARIC diabetes risk score [18] was based on a cohort of individuals aged 45 to 64 years at baseline and may therefore be limited when estimating diabetes development among individuals aged 20 years and older. However, we estimated diabetes prevalence in 10-year age increments. Moreover, the risk score’s moderate sensitivity and specificity may cause the model to under- or potentially overestimate future prevalence projections. Another possible limitation is that estimates of future disease burden are based on assumptions about the number at risk of disease and about mortality within the population. We have attempted to address this limitation by presenting the results of a sensitivity analysis, which includes variations in the proportion of the population at risk and in mortality. The third limitation is the diagnosis of diabetes in the NHANES data on the basis of a single FPG value. This strategy, although common in epidemiological studies, could potentially underestimate the prevalence of diabetes associated with isolated post-challenge hyperglycaemia, which occurs more commonly in women, the elderly, and in lean populations. It could also overestimate diabetes prevalence, because a clinical diagnosis of diabetes in asymptomatic patients requires two abnormal fasting glucose levels.

In summary, a continued focus on effective interventions for lifestyle modifications to decrease diabetes risk, as well as vigilant ascertainment of diabetes, appears crucial if the future prevalence and burden of diabetes in the US population are to be adequately addressed. This is especially important for minority populations, particularly the Hispanic community, which is projected to have an overwhelming future diabetes burden. Considering that minorities have historically had limited access to healthcare, these findings emphasise the importance of interventions targeting these populations.