Low health literacy (LHL) remains a formidable barrier to reducing gaps in health care quality and improving outcomes. Approximately one-third of the population (36%) is estimated to have basic or below basic health literacy,1 defined as the “degree to which individuals have the capacity to obtain, process, and understand basic health information and services needed to make appropriate health decisions”.2 Individuals with LHL find it difficult to understand directions for taking medicine, to calculate a dose of an over-the-counter medication for a child or comprehend a consent form.3,4 LHL may also contribute to suboptimal care and outcomes through lower participation in screening programs,5 reduced ability to act on and understand the advice of a health professional,6 and limited ability to access and navigate the health care system.7,8

Although health literacy and general literacy may be linked, researchers contend that the complexity of the health care system, the medical jargon used by many providers,9 and the exposure to novel health concepts (many times while under a great deal of stress), have the potential to negatively impact one’s health literacy skills, even among those with adequate literacy.10 Therefore, the prevalence of limited literacy is even higher when considered within a health context.11

Despite the availability of direct measures of health literacy including the Rapid Estimate of Adults’ Literacy (REALM), the Test of Functional Health Literacy in Adults (TOFHLA), and the Newest Vital Sign. 1215, the role of LHL in health outcomes remains largely unaddressed in public health and clinical practice. While these measures and other screening questions can be used by providers to identify individual patients at higher risk for LHL, administering such measures is logistically complex, and the measures themselves are limited largely to assessing reading ability and medical vocabulary and do not provide much, in any, information on other skills integral to health literacy.16,17 Such measures were also designed for individual, rather than community-level assessments, and provide little information about the level of health literacy within one’s patient population or community overall. Thus, there is a need for a predictive model that can use currently available data to help medical and public health practitioners, researchers, and health centers identify whether LHL may be a significant problem in the community or population they serve. The development of such a model may help set the stage for the development of community level measures, thus advancing action and facilitating efforts to target health literacy interventions in the practice setting and in areas of greatest need in the community at large.

Absent a predictive model, some providers use level of educational attainment or income as a proxy for health literacy, a practice that may lead to under- or over-estimation of the roles of each. While studies have identified individual characteristics associated with health literacy such as lower educational attainment, older age, lower income, and minority race or ethnicity12,13,15,1822,23 it is not clear whether using a combination of predictive factors in the form of a multivariable model is significantly more accurate than relying on a single variable.

Only recently have studies attempted to examine how these social factors work together11,21,24,25 and few, if any, of these models have included other constructs hypothesized to predict health literacy such as marital status, rurality,26 language spoken in home,1 and length of residence in the U.S. While the existing multivariable models demonstrate the utility, feasibility, and validity of such predictive models of health literacy, each has limitations. In Canada, for example, a multivariable model predicting health literacy included constructs such as daily reading at home and at work in addition to demographic characteristics.11 The amount of daily reading at home was the strongest predictor of health literacy in this model, yet such measures are not readily available in administrative or census data, reducing the model’s utility to generate community-level estimates. Recent U.S. models of health literacy have also been developed using data from elderly and/or Medicare populations. However, these models may have limited applicability to the general population in that the association of health literacy with demographic characteristics may vary with the age of the population of interest. Some have argued, for example, that among the elderly, income is not a strong predictor of health literacy, as many are no longer employed.27

Unfortunately, there have not been attempts to examine combinations of known predictors of LHL in a nationally representative sample of U.S. adults. Developing such a predictive model has significant potential to advance efforts to address action on poor quality care and outcomes related to health literacy by providing practitioners and public health officials with information on the average health literacy of the community they serve. Individuals and organizations serving communities with lower average health literacy may then target and implement a range of additional supports and strategies to increase individuals’ access to and understanding of health information.

As a first step toward overcoming the limitations of existing multivariable models, and to provide clinicians and health care providers with a means to estimate the health literacy of the community they serve, we developed two related models of health literacy that can be applied to widely available census data. Both used an identical set of demographic factors to predict health literacy as measured by the National Assessment of Adult Literacy (NAAL), a large, nationally-representative survey of adults in the United States. The first model estimates a mean health literacy score; the second estimates the probability of having health literacy skills in the “above basic” range (i.e., intermediate or proficient).1 We also examine whether such models are better predictors of community health literacy as compared to commonly used proxies such as education or income.



We used data from the 2003 NAAL household sample, an in-person assessment of literacy among a nationally-representative sample of 18,541 community-dwelling US adults (16 years and older) conducted by the U.S. Department of Education28 (response rate 60.1 percent29). The goal of the NAAL was to measure literacy by assessing the extent to which individuals could understand and use written materials encountered in everyday activities (e.g., reading a bus schedule or newspaper editorial). The NAAL included a component specific to health literacy, assessing the ability to effectively use health-related materials (e.g., medication label, written directions from doctor, consent form) to accomplish specific tasks, and was the first large-scale assessment of health literacy in the United States. Twenty-eight of the 152 NAAL items assessed health literacy, with each individual responding to about a third of the questions as part of the randomized block design of the NAAL survey.30 Our predictive models utilized data from the health literacy component of the NAAL specifically. More information about the NAAL, its multi-stage sampling design, and scoring procedures can be found in the 2006 National Center for Education Statistics report.1 Our analytic sample was limited to those who were 18 years of age or older, and were not missing items used to construct their health literacy score (n = 17,466).

Study Variables

Health Literacy

The NAAL assessed health literacy on a 0 to 500 point scale (mean = 245, standard deviation = 55).1 The National Research Council also classified these continuous scores into four ordinal performance categories to reflect an individual’s ability to successfully complete tasks of a given difficulty: below basic (0–184), basic (185–225), intermediate (226–309), and proficient (310–500).1 About 14% of the population had below basic health literacy skills, indicating the ability only to perform tasks such as circling the date on an appointment slip. About 22% of individuals had basic health literacy skills, indicating, for example, the ability to give two reasons why a person should be tested for a specific disease, using information in a clearly written pamphlet. 53% of the population had intermediate health literacy skills; such individuals can perform moderately challenging health literacy activities, including determining at what time a person can take a prescription medication, using information on the prescription drug label. Only about 12% of the population had proficient health literacy, indicative of the skills necessary to perform more complex and challenging literacy activities, such as calculating one’s personal share of employer health costs using a table.1

Socio-demographic Predictors

Variables included in the model had to meet two criteria: they had to be available in the NAAL and census data, given our focus on developing a model to estimate and map community level health literacy and they had to be established or strongly hypothesized to be associated with health literacy. We included as predictors gender, age, race/ethnicity, education, poverty status, marital status, residence in a metropolitan statistical area (MSA), language other than English spoken in home, and years residing in the United States (parameterized categorically as shown in Table 1). We used six categories of educational attainment; however, given that younger individuals (18–24) may not have completed their education at the time of the NAAL, we present estimates of the association between educational attainment and health literacy separately for individuals 18–24 years of age and 25 or older. Income was collected by the NAAL in 2003 dollars. We used this information to generate categories representing income relative to the federal poverty limit (FPL), as defined by the U.S. Census Bureau.31

Table 1 Sample Demographics (N = 17,446)


We developed two predictive models of health literacy. The first is a linear regression model predicting the mean (or average) health literacy score using the continuous form of that outcome. The second is a probit model predicting the proportion of the population scoring above the basic level of health literacy (the complement of “basic or below basic” health literacy), coded so that positive coefficients correspond to better health literacy in both models. To assess the extent to which these multivariate models add predictive strength over a single predictor, we compared the model coefficients and r-squared of the multivariate linear model to bivariate models using only educational attainment or income as a predictor.

Given that, by design, each individual responded to only about one-third of the NAAL items, we employed the standard imputation and analysis methods that correspond to such a design.32 For each individual, five estimated health literacy scores were generated based on their responses to the NAAL items; these estimates capture uncertainty based on different individuals being asked different items. The five estimates are integrated into a single set of means, regression coefficients, and p-values using a standard process of averaging the results of five parallel regression models. All analyses included probability weights and accounted for clustering. Analyses were conducted using SAS 9.1.


Table 1 presents the weighted percentages of sample characteristics as well as the mean health literacy score (averaging the five parallel values) and percent scoring ‘above basic’. Forty-eight percent of the study population was male, 71% was White, 11% Black, and 12% Hispanic. One-seventh (14%) had less than a high school diploma and 27% reported an income below 200% of the federal poverty limit. The majority resided in a MSA, had never been married, were born in the U.S., and spoke English at home. Only 6% had lived in the U.S. less than 10 years. As expected, older individuals, minorities, those with less education, lower incomes, those who were divorced, widowed or separated, and those who had been living in the U.S. for fewer years had lower mean health literacy.

Predictive Models of Health Literacy

Table 2 presents the results of the multivariate models predicting health literacy as a continuous and dichotomous (‘above basic’) outcome. All variables, with the exception of living in an MSA and language spoken in home made significant independent contributions to the models. Linear regression results are presented as unstandardized regression coefficients; probit results are presented as marginal effects, which can be interpreted as the change in probability of having ‘above basic’ health literacy with a one unit change in the predictor. The adjusted R 2 for the linear regression model and the probit model33 indicate that these demographic predictors accounted for 30% and 21% of the variance in health literacy scores respectively. Educational attainment was strongly positively associated with health literacy, with the 40.7 point mean difference between the lowest and highest categories among those 25 and older corresponding to 0.74 standard deviations, a “large” effect.34 Blacks and Hispanics had health literacy scores that averaged 0.6 standard deviations lower than non-Hispanic whites; mean health literacy for those aged 75 and older was 0.7 standard deviations lower than those aged 18–24. To a lesser extent, health literacy was lower for those with lower incomes and recent immigrants.

Table 2 Linear (column 2) and Probit (column 3) Multivariate Models Predicting Health Literacy (N = 17,466)

Comparison of Multivariate to Bivariate Models

The linear and probit multivariate models explained approximately twice the variance as a model using educational attainment alone (30% vs. 15.5% for linear model, 21% vs. 10% for the probit; results not shown). Similarly, a model using only income as a predictor explained only 11% of the variance for the linear model and 8% for the probit. Applying Cohen’s criteria for multivariate shared variance, the effect sizes for the individual variables can be classified as small to medium, while the effect size for the multivariable model can be classified as medium (for probit) to large (for linear model).35


Using a nationally representative sample, we developed two predictive models of health literacy: one estimating mean health literacy, and one estimating the probability of having health literacy skills in the ‘above basic’ (intermediate or proficient) range. Lower educational attainment, racial/ethnic minority status, older age, lower income, and recent immigration to the U.S. were associated with lower estimated health literacy. Individuals who were not married also had a lower health literacy, on average, although the association was much weaker (p < 0.05).

While the results of these models are consistent with previous work in this area, several findings merit further comment. First, despite controlling for a host of related characteristics, race and ethnicity were strongly associated with health literacy. The strength of this association was somewhat surprising, although it may be explained in part by unmeasured factors such as quality of education, which are correlated with both race/ethnicity and health literacy. Schools serving a high proportion of minority students, for example, are less likely to offer advanced placement courses and to have effective teachers in terms of years of experience and number of teachers with certifications in their primary teaching field.36 Given that racial/ethnic minorities tend to cluster in both inner-cities and rural areas where the quality of education may be lower, this may help to explain the observed racial/ethnic differences in health literacy.

Somewhat surprising was the lack of association between language spoken in the home and health literacy. Results from our models suggest that recent immigration to the U.S., rather than language spoken at home per se, is a stronger predictor of health literacy. Note, however, that our models were based on NAAL data, which assesses health literacy in the English language. Therefore, one’s health literacy skills may be higher in their native language than estimated by our models.

Another unexpected finding was that no difference in health literacy was found between those living in rural and urban locations. Results, however, may be limited by the only available measure of rurality in the NAAL: a dichotomous measure of MSA. It is more likely that health literacy follows an inverse U-shaped curve, where health literacy is lower, on average, among individuals residing in rural or urban areas, with individuals in suburban areas having higher health literacy, on average. Finally, the multivariate model was a stronger predictor of health literacy and explained substantially more of the variance than commonly used health literacy proxies such educational attainment or income.

Several limitations to these models are worth noting. First, the NAAL assessed health literacy using only printed materials. As a result, our models focus on the ability to read materials to accomplish health related tasks. They do not predict oral language (speaking) or aural language (listening) skills, which have been cited as important components of health literacy.17,16 Consequently, predictions of health literacy based on our models will not fully capture a broader conceptualization of health literacy. Second, although characteristics in our model were selected based on existing research findings and theoretical justification, there are likely other unmeasured characteristics that contribute to health literacy that were not included in the model, such as quality of education and state of residence. To assess the potential for regional variation in the models we did conduct stratified analyses for the four US Census regions (north, south, east, west; results not shown). Models predicting the mean health literacy score for each region explained between 27% and 31% of the variance in health literacy scores. While there were minor regional differences in the models, none of these were statistically significant (F test = 0.48, p-value = 0.99 for linear models; F test = 0.28, p-value = 1.0 for probit models).

These predictive models of health literacy expand our understanding of factors that contribute to low health literacy in the general population, and allow us to estimate the average health literacy of communities. In so doing, individuals and organizations serving communities with lower average health literacy may then target and implement a range of additional supports and strategies to increase individuals’ access to and understanding of health information. This includes, for example, offering in-depth patient counseling with nursing staff or health educators, where important information related to diagnosis, treatment and follow-up can be discussed using plain language and in a less intimidating environment. A significant advantage of such models is that, when applied to census data and well-defined geographic areas such as census tracts, the average health literacy of a region can be mapped, providing visual insight into local areas or “hot spots” of lower average health literacy within the community, further helping promote effectively and appropriately targeted action to reduce disparities, poor quality care and poor outcomes related to limited health literacy.