Health literacy has been defined as “the ability to obtain, process, or understand basic health information needed to make appropriate health care decisions 1. Both inadequate (i.e. very low) and marginal health literacy (HL) appear to be important factors in the causal pathway to health disparities, especially in low income patients with chronic diseases25. Given the high prevalence (46% of the US population) of inadequate (i.e. very low) plus marginal HL, often described as ‘limited HL’6 and limited literacy’s association with poor health outcomes3,712, there has been great interest in including HL assessments in epidemiologic and clinical research13. However, because standard HL measurements require face-to-face interviews1416, take from 3 to 20 minutes, and cannot be administered by phone, they are often not feasible in large epidemiologic and public health research.

Chew and colleagues developed three self-reported HL “screening” questions and found that a single item about “confidence with completing forms” with a response cut-point of “somewhat,” may be sufficient to detect patients with inadequate HL (C-index 0.74 (0.69-0.79)), sensitivity, 0.60; specificity, 0.82), but the items did not perform as well in patients with inadequate plus marginal HL (C-index 0.72 (0.69-0.76)17,18. Chew also found that a scale combining the three questions offered no additional benefit to the one question about confidence with forms. A recent review article endorsed the use of the ‘confidence with forms’ item to assess HL in clinical settings19. However, these self-reported items have only been validated among largely homogeneous English-speaking populations17,18,20. The performance of the self-reported HL questions within Spanish-speaking and ethnically diverse patient subgroups has not been assessed19.

It is important to validate these three self-reported HL items both individually and as a scale among Spanish speakers, patients with low-income, and minorities because the prevalence of limited HL is highest among these groups6,21. HL and limited English proficiency have a complex relationship, adding to the importance of measuring HL in languages other than English22. However, Spanish HL assessment currently requires face-to-face, multi-item, interviewer-administered assessments[23. Therefore, we examined the performance of three self-reported HL questions individually and as a summative scale among English and Spanish-speaking, diverse, low-income, populations with type 2 diabetes. We further explored whether the self-reported questions performed equally well across language, race/ethnicity, educational attainment, age, gender, and health status subgroups.


This validation study was nested within a trial of diabetes self-management support interventions in the San Francisco Department of Public Health (SFDPH). The methods have been previously reported24,25. Briefly, patients were included if they were over age 17 years, had ICD-9 codes consistent with type 2 diabetes, self-reported fluency in English and Spanish, made ≥1 primary care visit at one of four (SFDPH) clinics in the prior year, and had a hemoglobin A1c value (HbA1c) ≥8.0% at the time of recruitment. All participants provided informed consent, and the Committee on Human Research at the University of California, San Francisco approved the study protocol.

Self-Reported HL Measure

Bilingual research assistants administered the following three self-reported HL questions in person in English or Spanish:1 How confident are you filling out medical forms by yourself? (¿Qué tan seguro(a) se siente al llenar formas usted solo(a)?) “confident with forms”;2 How often do you have problems learning about your medical condition because of difficulty understanding written information? (¿Qué tan seguido tiene problemas aprendiendo sobre su condición médica porque es difícil entender información escrita?) “problems learning”; and3 How often do you have someone like a family member, friend, hospital or clinic worker or caregiver, help you read hospital materials? (¿Qué tan seguido tiene usted, un familiar, un amigo(a), un empleado(a) del hospital o la clínica u otra persona que le ayude a leer materiales del hospital?) “help reading”17,18. The self-reported HL questions were translated into Spanish, back-translated, and extensively pilot-tested. For “confident with forms” the categories were “not at all, a little, somewhat, quite a bit, and extremely”17,20. For “problems learning” and “help reading,” response categories were “always, often, sometimes, rarely, or never”. To create the summative scale, responses were assigned a number from 1 to 5. For “confident with forms” 1 was assigned for a Likert response of “extremely,” and 5 for “not at all”. For “problems learning” and “help reading” number assignments were reversed. Scores ranged from 3-15 with higher scores reflecting worse self-reported HL.

Standard Health Literacy Measure

As the reference measure, we administered the validated short Test of Functional Health Literacy in Adults (sTOFHLA) in English and Spanish14. Higher scores (range 0-36) indicate better reading comprehension. We used standard cut-offs in which scores from 0-16 represent inadequate HL, 17-22 marginal HL, and 23-36 adequate HL14. S-TOFHLA scores of 0-22 are collectively referred to as inadequate plus marginal HL. We assessed the performance of the self-reported questions and the summative scale compared to the s-TOFHLA categories of inadequate (scores 0-16) and inadequate plus marginal literacy, (scores 0-22).

Patient Characteristics

We assessed self-reported: language, defined as the language in which participants chose to be interviewed (i.e. English and Spanish); race/ethnicity (Hispanic White, non-Hispanic White, non-Hispanic Black, Asian/Pacific Islander); educational attainment (< high school versus ≥ high school/GED); age (mean and <65 years versus ≥65 years); gender; and health status (fair-to-poor versus good-to-excellent) – patient characteristics which have been associated with HL level26. We considered race/ethnicity jointly because of our relatively modest sample size.


We used percentages and means to describe our study population. We calculated C-Indices (the area under the receiver operator curve (ROC)), for each question and for multiple cut off points of the summative scale for the HL categories of inadequate (comparing TOFHLA scores of 0-16 versus 17-36) and inadequate plus marginal (comparing TOFHLA scores of 0-22 versus 23-36). A C-index of 1.0 reflects perfect prediction, with both sensitivity and specificity being equal to 1. A C-index of 0.5 reflects discrimination no better than chance27. We also calculated multilevel likelihood ratios with 95% confidence intervals (CI), sensitivity, and specificity for each question and the summative scale. In cases of zero responses, a standard continuity correction was applied by adding 0.5 to all of the cells in the two-by-two table prior to computing the LR and the confidence interval28. We then assessed whether these questions and the summative scale were equally valid in analyses stratified by language. We used asymptotic methods to determine whether observed differences in the C-indices between the individual questions and the summative scale and between stratified language subgroups were statistically significant. Using the same methods, we also stratified by age, gender, educational attainment, health status, and race/ethnicity to ensure the questions were equally valid in diverse patient subgroups. This is particularly important among race/ethnic subgroups, because prior studies suggest health literacy may partly explain racial/ethnic health disparities29. In these subgroup analyses, comparisons were made between 48 pairs of subgroups; we therefore regarded a difference as statistically significant at a Bonferroni-corrected level of p < 0.001[27,30.


Of 296 participants, 48% were Spanish-speaking, and only 9% were white, non-Hispanic (Table 1). Limited HL was prevalent: 47% had inadequate HL as measured by the sTOFHLA and 12% had marginal literacy. For the self-reported HL questions, 57% reported being confident with forms “somewhat” or less, 45% of participants reported problems learning “sometimes” or more frequently, and 42% reported needing help reading “sometimes” or more frequently.

Table 1 Patient Characteristics (N = 296)

Overall, participants who reported less confidence with forms (C-index 0.82, CI 0.77-0.87), more problems learning (C-index 0.72, CI 0.67-0.78), needing more help reading (C-index 0.68, CI 0.62-0.74), and higher summative scale measures (worse HL) (C-index 0.82, CI 0.77-0.86) were consistently more likely to have inadequate HL (sTOFHLA 0-16), as demonstrated by C-indices >0.5 (range for the questions and scale 0.68-0.84). Overall, these questions also successfully differentiated those with inadequate plus marginal HL (sTOFHLA 0-22) compared to those with adequate HL (sTOFHLA 23-36) (C-indices ranging from 0.69-0.81). (“confident with forms,” C-index 0.81, 0.76-0.86; “problems learning,” C-index 0.74, 0.68-0.79; “help reading,” C-index 0.69, 0.64-0.75; scale, C-index 0.82, 0.77-0.87) The performance of the summative scale was not statistically significantly different from the “confident with forms” question (p for inadequate HL = 0.85; p for inadequate plus marginal =0.77). Both the “confident with forms” item and the summative scale performed better than the other 2 questions for both inadequate and inadequate plus marginal HL (p < 0.01 for all comparisons).

In our stratified analyses by language, for inadequate (Table 2) and inadequate plus marginal HL (Table 3) the C-indices did not significantly differ between English and Spanish speakers. However, the three questions demonstrated higher sensitivity and lower specificity at any given cut point among Spanish speakers compared to English speakers. Sensitivity, specificity, and likelihood ratios were highest for the “confident with forms” question, among English and Spanish speakers, for identifying both inadequate HL (English, C-index 0.76; Spanish, C-index 0.74) (Table 2) and inadequate plus marginal HL (English, C-index 0.70; Spanish, C-index 0.80) (Table 3). For both inadequate (Table 2) and inadequate plus marginal HL (Table 3), a cut point of “somewhat” or less confident with forms, a cut point used in prior studies18, appeared to maximize both sensitivity and specificity for English speakers. However, for both literacy levels a cut point of “a little” or less confident with forms functioned best, among Spanish speakers (Table 2 & 3). The test characteristics for the summative scale (See Online Appendix) demonstrate that a cut point of 9, corresponding to answers of “sometimes/somewhat” on all three questions, appeared to maximize both sensitivity and specificity for English and Spanish speakers. (See Online Appendix)

Table 2 Test Characteristics for Health Literacy Questions Compared to sTOFHLA Scores for Inadequate Health Literacy
Table 3 Test Characteristics for Health Literacy Questions Compared to sTOFHLA Scores for Inadequate + Marginal Health Literacy

In stratified analyses, after adjustment for multiple comparisons, we found that the self-reported questions performed well and consistently across age, gender, educational attainment, health status, and race/ethnicity participant subgroups for identifying inadequate HL. For inadequate plus marginal HL there was slightly more variation between groups, but none of these differences were statistically significant (See Online Appendix, all P > 0.01).


Because of its well-established role in health outcomes and health disparities, HL is an important factor to study in public health and epidemiological research13. To our knowledge, this is the first study to test the performance of self-reported HL questions among an ethnically diverse, English and Spanish-speaking population, and to compare the performance of the questions between language and other patient characteristic subgroups. We found that three self-reported HL questions could identify those with inadequate, and inadequate plus marginal HL within this ethnically diverse, English and Spanish-speaking population with a moderate degree of discrimination. The “confident with forms” question performed best among the individual items and within both language and all other patient characteristic subgroups. The summative scale performed similarly to the individual “confident with forms” question.

Our findings build on previous studies of the three self-reported HL measures. As in prior studies17,18,20, the “confident with forms” question performed the best out of the three questions. In contrast to prior work, we found that both the “confident with forms” question and the summative scale could discriminate moderately well between those with inadequate plus marginal vs. adequate HL, in addition to inadequate HL, for both English and Spanish speakers. For the “confident with forms question” Chew et al found a C-index of 0.72 for inadequate plus marginal HL while we found a C-index of 0.81 for the overall sample. This is important because marginal HL, in addition to inadequate HL, has been associated with poor health outcomes including mortality and health disparities4,12,26. Because dose response associations have been found between HL level and poor patient outcomes,31 some investigators may want to identify both literacy level subgroups. Our results also mirror those of prior studies in finding similar performance between the “confidence with forms” item and the summative scale17.

In stratified analysis by language, the C-indices for the “confidence with forms” question were similar for Spanish and English speakers. However, the item seemed to have higher sensitivity but lower specificity among Spanish speakers at every cut point. The optimum cut point for the “confident with forms” question for English speakers that maximized both sensitivity and specificity was “somewhat” or less, while for Spanish speakers the optimum cutpoint was “a little” or less. These findings may be the result of cultural variation and /or Spanish-speaking participants responding to the ‘confident with forms” question for forms not only written in Spanish, but also in English. As such, researchers may want to consider different cut points for English and Spanish-speaking subgroups.

The utility of the “confident with forms” question and summative scale among the Spanish speakers in our population may also be affected by the relatively high prevalence of language concordant patient-physician dyads in this clinical setting and the ubiquitous access to Spanish transcription and translation services22. Patient–physician language concordance has been shown to be a powerful determinant of patient satisfaction with communication and may have leveled the playing field with their English-speaking counterparts in terms of patients feeling confident with forms22. As such, the self-reported measures in this population may have been detecting true HL deficits rather than those related to language discordance or limited English proficiency.

Because of a prior lack of brief, validated measures of HL for diverse populations, some have suggested using demographic characteristics to estimate HL32. This approach does not permit the ability to assess the independent effects of HL beyond demographic characteristics. This is important because HL levels have been shown to vary widely within patient demographic subgroups6. Therefore, we contend that independent measurement of HL, for example with the “confident with forms” question or summative scale, would contribute substantially to epidemiologic and clinical research. In the clinical setting, screening for limited health literacy is controversial, with the current expert recommendations against routine screening3234. However, in selected clinical situations, such as the prescribing of high-risk medications, screening for limited health literacy has been advocated, and the use of a single-item screener would be more feasible in busy clinical settings than standard literacy assessments19.

While imperfect in their precision, the summative scale, and specifically the single “confident with forms” question, have some clear advantages over direct, longer HL measurements. They are brief and can be administered via telephone. Our group has recently field-tested these questions both individually and as a scale within a large sample of diverse diabetes patients and have demonstrated robust, independent associations with a range of outcomes, including perceived need for self-management support35, higher rates of hypoglycemia36, and lower patient use of electronic health records37. While these studies did not assess performance of these items across demographic sub-groups, these associations lend support to the items’ predictive validity.

Our study has some limitations. First, we included only patients with poorly controlled diabetes, which may limit generalizability to healthier populations. Second, this study was conducted at four sites within one county health care system and may not reflect regional differences. Third, in our practice environment there is excellent access to translation services and many physicians and staff speak Spanish. Results may differ for Spanish-speaking patients in different linguistic environments. Finally, our results reflect the criterion validity of the self-reported HL questions, i.e., their relationship with a gold-standard HL measurement. Further work is needed to establish predictive validity of these questions in relation to health outcomes of interest.

In summary, although limited HL is associated with a range of health outcomes, it is often not feasible to measure directly in clinical, epidemiologic, or public health studies because standard measurement tools are lengthy and cannot be administered by telephone. Our study suggests that the single self-reported “confident with forms” question or the summative scale of the three self-reported HL questions discriminate diverse English speakers and Spanish speakers with adequate HL from those with inadequate and inadequate plus marginal HL to a degree that warrants application and further assessment in epidemiologic and clinical research involving diverse populations.