Diabetes Mellitus is the sixth leading cause of death in the United States (U.S.), accounting for approximately 70,000 annual deaths. Age-standardized adult diabetes death rates across U.S. states ranged from approximately 2 per 10,000 people in Arizona and Florida to 4.5 to 5 in West Virginia and the District of Columbia (D.C.) [1]. There may be two reasons for this large variation: First, there may be variation in diabetes prevalence across states due to differences in risk factors for diabetes. For example, the prevalence of obesity in a number of Southern states is almost 60% higher than Colorado, where obesity is lowest [2]. Second, there may be differences across states in diagnosis and treatment of diabetes or of cardiovascular risks among diabetics. Reliable information on diagnosed and undiagnosed diabetes prevalence at the state level is important because states are important administrative units for funding and implementing programs that influence diagnosis and treatment.

Currently, the only source of information on diabetes prevalence at the state level is the Behavioral Risk Factor Surveillance System (BRFSS), a state-representative telephone survey. However, the BRFSS data are based on self-reports and do not provide estimates of undiagnosed diabetes. The National Health and Nutrition Examination Survey (NHANES) uses laboratory measurements and provides estimates of diagnosed and undiagnosed diabetes, but is representative only at the national level. In this study, we combined data from NHANES and BRFSS to estimate diabetes prevalence and diagnosis at the state level. Our results provide information for state diabetes prevention and control programs, and our methods can be used for regular low-cost monitoring of diabetes at the state level.


Data Sources

NHANES uses a complex multistage stratified clustered probability design to measure health and nutrition characteristics of a nationally representative sample of the civilian non-institutionalized population aged two months and older. NHANES includes an in-person interview and a subsequent physical examination and measurement component in a mobile examination clinic (MEC) or at home for those unable to visit the MEC. We used NHANES data from 2003 to 2006. The response rates for the household interviews were 80% for 2003-2004 and 79% for 2005-2006. The corresponding response rates for the medical examination after the household interview were 95 to 96%.

Each interviewed participant was randomly assigned to either a morning or afternoon/evening MEC session. Subjects ≥ 20 years old assigned to the morning session were asked to fast for 8 to 24 hours, with the exception of those on insulin or those who were excluded for other safety reasons. The NHANES MEC and fasting sample weights account for exclusion, non-response, and inappropriate fasting time. Additional information on NHANES design and methods, including on diabetes measurement, is available elsewhere [3, 4] and online

The BRFSS is an annual cross-sectional telephone health survey. Currently, the survey is conducted in all 50 states and the District of Columbia using random-digit dialing to obtain a state-representative sample of the civilian, non-institutionalized population aged 18 and over. In 2003, the response rate among eligible subjects who answered the phone was 77%. Additional information on the design is available elsewhere [5, 6] and online

We included adults aged 30 and older in NHANES and BRFSS who had answered the self-reported diabetes question, which asked if they had ever been told by a health professional that they had diabetes. The response rate for this question was more than 99.8% in both surveys. We did not include younger participants because diabetes prevalence is relatively low in these ages.

Statistical Analysis

Consistent with previous analyses [4], we defined total diabetes as either having answered yes to the diabetes diagnosis question: "Other than during pregnancy, have you ever been told by a doctor or health professional that you have diabetes or sugar diabetes?" or having a fasting plasma glucose (FPG) level of ≥ 126 mg/dL. We used FPG because it is used to define diabetes by the American Diabetes Association [7].

We used data from NHANES, which is representative at the national but not at the state level, to characterize the relationship between undiagnosed diabetes status (defined as FPG ≥ 126 mg/dL) and a set of health system, sociodemographic, and risk factor variables listed in Table 1 using a logistic regression. These variables were selected a priori based on their potential association with diabetes prevalence. We excluded education from the primary list of predictors as including it did not improve the fit of the model. In addition, 50.2% of observations in NHANES were missing either smoking or insurance status or both. We used a missing indicator to include these observations in the regression model. The regression incorporated appropriate sampling weights.

Table 1 Description of the outcome and explanatory variables from NHANES and BRFSS and the corresponding odds ratios (OR) and 95% confidence intervals (95% CI).

We estimated the individual-level probability of having diabetes in BRFSS 2003-2007 in two steps: First, participants who had answered "yes" to the diabetes diagnosis question were, by definition, assigned a probability of 1.0 for having diabetes. Second, the probability of having undiagnosed diabetes (i.e., FPG ≥ 126 mg/dL) for those who answered "no" to this question was estimated using the coefficients of the logistic regression fit on the NHANES dataset. Estimates of diabetes prevalence and diabetes diagnosis by age, sex, and state were obtained from the BRFSS using appropriate sample weights. The difference between total diabetes and self-reported diabetes is undiagnosed diabetes. In separate analyses, we used linear regressions to model the relationship between FPG as a continuous variable and self-reported diabetes diagnosis, medication use, and the health system, sociodemographic, and risk factor variables in Table 1 (results for continuous FPG analysis are available from authors by request). We used STATA version 10 for all analyses (StataCorp Texas). We present the results in two age groups: 30-59 and ≥ 60 years.


The national prevalence of diabetes among US adults ≥ 30 years was 13.7% (95% Confidence Interval 12.0%, 15.4%) for men and 11.7% (CI95 10.4%, 13.0%) for women in the pooled 2003-2006 NHANES. Nationally, approximately 32% of all diabetes cases in 2003-2006 were undiagnosed, a percentage that has changed little since 1999-2002 [4].

Regression results

Among those who answered "no" to having been diagnosed with diabetes, being male and being older was associated with a higher probability of having diabetes (Table 1). The effect of age on diabetes risk was largest in those 60 to 69 years old and declined slightly in those ≥ 70 years old, consistent with the available evidence on the age association of blood glucose [8]. Overweight and obesity were associated with higher prevalence of undiagnosed diabetes, with obese participants (body mass index, BMI ≥ 30 kg/m2) having 4.29 times (95% CI 2.25, 8.17) the odds of having undiagnosed diabetes compared to normal weight. After controlling for all other factors, Hispanics had twice (95% CI 1.07, 3.83) the odds of having undiagnosed diabetes compared to whites, and the uninsured had 1.58 (95% CI 0.83, 3.02) times the odds compared to insured subjects.

We evaluated the performance of the prediction model using both internal and external validations. For internal validation, we applied the regression coefficients to NHANES 2003-2006 observations (i.e., the same data used in estimating the regression model) to predict diabetes prevalence. The differences between the predicted and actual diabetes prevalence for different age, sex, and race groups were on average 0.5 percentage points and at most 8.4 percentage points. The Pearson correlation coefficient for the observed and predicted diabetes prevalence for different age, sex, and race groups was 0.98. For external validation, we applied the coefficients of regressions estimated using the 2003- 2006 rounds to the same variables in pooled data from two previous rounds of NHANES (1999-2000 and 2001-2002). The observed-predicted differences for individual age, sex, and race groups were at the extreme slightly worse than those in the internal validation; specifically, the 60- to 69-year-old males from "other race" had a 20 percentage point discrepancy. This may, however, be because the composition of this race changed between the two surveys. The Pearson correlation coefficient for the observed and predicted diabetes prevalence for different age, sex, and race groups was 0.93. On average, the predicted prevalence was 0.1 percentage points higher than the actual prevalence (versus 0.5 lower percentage points in the internal validation).

National-level prevalence of diabetes and undiagnosed diabetes

The predicted national prevalence of diabetes in 2003-2007 was 14.4% (14.3%, 14.5%) for men and 11.4% (11.3%, 11.5%) in women. The only sociodemographic group whose predicted and measured prevalences were significantly different was the uninsured, who had an actual prevalence of 9.2% (7.4%, 11.0%) but a predicted prevalence of 11.9% (11.6%, 12.2%).

State-level prevalence of diabetes and undiagnosed diabetes

In 2003-2007, the lowest prevalence of diabetes was in the Midwest and the Northeast, including Vermont, Minnesota, Montana, and Colorado, with age-standardized prevalence ranging from 11.0% to 12.2% for men and 7.3% to 8.4% for women (Figure 1 and Table 2). Diabetes prevalence was highest in the primarily Southern and Appalachian states, including Mississippi, West Virginia, Louisiana, Texas, South Carolina, Alabama, and Georgia, where age-standardized diabetes prevalence was 15.8% to 16.6% for men and 12.4% to 14.8% for women, i.e., approximately 30% to 51% higher for men and 48% to 103% higher for women than the states with lowest prevalence. The same geographic pattern was observed when younger (30-59 years) and older (≥ 60 years) age groups were considered separately. The Spearman rank correlation coefficient of state diabetes prevalence and mean BMI was 0.53 for men and 0.76 for women [2].

Table 2 Estimated prevalence (sampling standard error)* of total diabetes by state, age, sex, race and insurance status (Figures show actual prevalence; age-standardized figures available from authors).
Figure 1
figure 1

Estimated prevalence of total diabetes by state, sex, and age group. Within each age group, figures are age-standardized to the 2000 U.S. population.

Age-standardized diabetes prevalence was higher in men than women in all states, with the largest differences in Minnesota, Colorado, Utah, and Maine, where prevalence in men was 32% to 38% higher than among women. The smallest male-female differences were in the District of Columbia, Mississippi, West Virginia, and Louisiana, ranging from 6% to 18% (Figures 1 and 2). Men also had higher prevalence of diabetes than women in almost all states and age groups, except in the youngest ages (30 to 39 years), consistent with the national results from NHANES. Correlation between age-standardized male and female diabetes prevalence across states was 0.9.

Figure 2
figure 2

Relationship between male and female diabetes prevalence, by age. Each data point corresponds to one state.

The age-standardized proportions of diabetes cases that were undiagnosed were lowest in Hawaii, Mississippi, West Virginia, and Tennessee (19.5% to 21.4% of all diabetes cases) and highest in Minnesota, Montana, North Dakota, Vermont, and Colorado (31.1% to 33.3% of all diabetes cases). However, the absolute prevalence of undiagnosed diabetes, as a percent of total population, was highest in New Mexico, Texas, Florida, and California (3.5 to 3.7 percentage points) and lowest in Montana, Oklahoma, Oregon, Alaska, Vermont, Utah, Washington, and Hawaii (2.1 to 3.0 percentage points) (see Table 3 for prevalence of undiagnosed diabetes by age, sex, race and insurance status).

Table 3 Estimated prevalence (sampling standard error)* of undiagnosed diabetes by state, age, sex, race, and insurance (Figures show actual prevalence; age-standardized figures available from authors).

Men in all states had higher proportions of undiagnosed diabetes than women, with the male-female difference in undiagnosed proportion being largest in Hawaii, Mississippi, District of Columbia, West Virginia, and Idaho, where the proportion undiagnosed among men was 34.1% to 39.0% higher than among women. The male-female diagnosis disparity was smallest in Colorado, Pennsylvania, Vermont, and Minnesota (12.9% to 19.8%). When stratified on race, the proportion of cases undiagnosed was highest among Hispanics (33%), followed by whites (28%) and blacks (19%), and it was lowest in the residual group of "other races" (6%). One-third of diabetes cases were undiagnosed in participants who did not have insurance compared to one-fourth among insured Americans.


To our knowledge, this is the first study to estimate the total prevalence of diabetes and the proportion of diabetes that is undiagnosed at the state level. The Southern and Appalachian states had the highest diabetes prevalence, with Mississippi faring the worst. The Northern plains, the Northeast and the Midwest had the lowest prevalence. Prevalence of undiagnosed diabetes also varied across states, with Southern states and California having the highest prevalence. The proportion of undiagnosed diabetes was higher in men, Hispanics, and the uninsured compared to women, whites and insured. In fact, one-half of diabetes cases were undiagnosed in uninsured Hispanic men. These findings are important for the development and implementation of adequate state programs to prevent, diagnose, and control diabetes.

This analysis has a number of limitations: First, although our regression models included important sociodemographic, lifestyle, and health system determinants of diabetes risk and diagnosis, there are other factors that affect diabetes, such as diet and quality of care [9, 10]. For instance, we were unable to include family history of diabetes, physical activity, alcohol use and specific dietary risk factors of diabetes [1114] in the model because BRFSS does not include a sufficiently detailed dietary questionnaire or any questions on family history of diabetes and because the questions used to measure alcohol use and physical activity are different from those used in NHANES. The effects of some such factors may be captured by the variables in our model (e.g., self-reported diabetes, BMI, smoking, insurance status, visit to a doctor, etc.). If the unexplained effects vary systematically across states, the model may underestimate cross-state variation in diabetes prevalence, making our results conservative. Second, we conducted our analysis using FPG because of its availability for the most recent rounds of NHANES and because it is used by the American Diabetes Association to define diabetes. Other definitions of diabetes, e.g., based on glucose tolerance test, may have led to slightly different estimates. Third, BRFSS response rate varies across states. This may affect the state comparisons if the determinants of non-response are associated with diabetes prevalence. The single best way to reduce uncertainty in our analysis would be the addition of a validation component to BRFSS, which includes measured blood glucose for a random sample of interviewees. Finally, because 50.2% of observations in NHANES were missing either smoking or insurance status, we used a missing indicator in our regression models to include these observations. Dropping these observations would decrease the precision of our regression coefficients but would not affect the predictions of diabetes prevalence by states materially.

Despite uncertainties, our results currently provide the only estimates of total diabetes and undiagnosed diabetes in U.S. states, and should provide motivation, guidance, and benchmarks for designing, implementing, and evaluating diabetes prevention and control programs at the state level. Further, our methods allow states to combine the relatively low-cost BRFSS telephone survey with NHANES to regularly monitor the prevalence of diabetes and progress in diabetes diagnosis.

Increasing the coverage of lifestyle, e.g., physical activity and pharmacological interventions for diabetes, should be a priority in states with high diabetes prevalence. Some states also need to improve diagnosis, especially among men, because early diagnosis and intensive glycemic control reduces the future incidence of microvascular complications [15, 16]. Further, diabetes diagnosis will facilitate interventions that lower blood pressure and cholesterol, and hence the risk of cardiovascular disease, among diabetics [17, 18]. The states with the highest estimated diabetes prevalence in our analysis also have the highest levels of blood pressure and cardiovascular disease risk [19, 20]. This geographical distribution of cardiovascular risks and diabetes points to the need for lifestyle and health care interventions that reduce blood pressure and other cardiovascular risks in high-diabetes states.