BACKGROUND

Over 33 million U.S. residents are foreign-born.1 Twenty-two million persons have limited English proficiency (LEP),2 defined as “a limited ability to listen, speak, read, and write in English, and speak[ing] English less than ‘very well’”.3 From 1990 to 2000, 46 states experienced an increase in their LEP populations, 15 of which had an increase of over 100% and 14 others of over 50%.2 In New York City alone, nearly one-fourth of the population has LEP.4 Over 50% of the city’s Spanish-speaking population has LEP.4 Hospitals often call upon untrained staff or bystanders to interpret. Untrained interpreters are prone to editing, polishing, omissions, additions, substitutions, volunteered opinions, and confidentiality breaches.5 Patients who communicate through untrained, “ad hoc” interpreters are less satisfied with their patient–provider relationship compared with those in same language encounters.6 In encounters with bilingual nurses not trained in interpreting, approximately one-third of the uncomplicated and two-thirds of the complicated cases resulted in interpreting errors.7 Inadequately addressed language discordance between patient and provider inhibits LEP patients from receiving clinically indicated care, contributing to health disparities.8,9 Linguistic barriers limit patient education and adversely impact patient understanding and health outcomes.10,11 When information is not accurately understood or expressed, unnecessary, hazardous, or expensive diagnostic tests may be ordered, and indicated tests omitted.12 Conversely, interpreter services have led to the improved use of preventive services resulting in cost savings.13 A systematic review of studies on professional interpreters in the medical encounter revealed that the use of professional interpreters results in improved clinical outcomes compared with ad hoc interpreters.14

Health care facilities are mandated to address interpreting needs. Title VI of the 1964 Civil Rights Act requires that all federally funded institutions make “reasonable” attempts to provide meaningful access to language services for LEP patients.15 Various strategies have been employed to bridge language gaps.16,17 Simultaneous interpreting is a near word-for-word running rendition performed within milliseconds of the original speech—nearly simultaneously—almost like a voiceover. In consecutive interpreting, the interpreter listens as the primary speaker speaks, and then interprets only once the primary speaker has finished. The person interpreting can either be located in the exam room or not. In proximate interpretation, the person interpreting is in the room with the provider and patient. In remote interpretation, the interpreter is located outside the interview room but is linked to the provider and patient through telecommunication. Remote simultaneous interpreting is most commonly associated with the United Nations, and is often referred to as UN-style interpreting.18 Remote simultaneous interpreting has also been used effectively in the court system.19

Little is known about the impact of various interpreting strategies on interpreting speed and errors. This pilot study addresses this important gap by determining the accuracy and speed of four different medical interpreting strategies: remote simultaneous medical interpreting (RSMI), remote consecutive medical interpreting (RCMI), proximate consecutive medical interpreting (PCMI), and proximate ad hoc interpreting.

METHODS

Four scripted clinical encounters were run across the 4 interpreting methods, for a total of 16 interpreted encounters that generated the data for this study. Scripted encounters were used to enable the comparison of equivalent clinical content across interpreting modes. Six trained interpreters were randomly assigned to interpret in the 12 encounters involving RSMI, RCMI, and PCMI. In addition, 4 untrained, ad hoc personnel were each randomly assigned to interpret in encounters employing 1 of the 4 scripts (proximate ad hoc). The interpreters and ad hoc personnel were blind to the study design. Error analyzers, who were blinded to the method of interpreting they were analyzing, then scored each interpretation for linguistic and medical errors.

Script Development

Four patient–physician dialogues were prepared in English representing common primary care cases, including diabetes mellitus, tuberculosis testing, depression, and menopause. Each dialogue involved history taking, a brief physical exam, and discussions of diagnosis, testing, and treatment. The 4 dialogues were similar in length (1,366–1,500 words) and could be read in 6–9 minutes in English. The dialogues were constructed at similar levels of linguistic and medical difficulty. The patient portion was translated into Spanish and then backtranslated by the study team linguist to ensure accuracy.

Error Coding Methodology

Based on a review of the translation, interpreting, and linguistics literature,2024 an error coding methodology was developed to measure both linguistic and medical errors (meaningful linguistic errors with medical information). Linguistic errors consisted of additions, omissions, or substitutions. Additions occurred when the interpreter added any language not uttered by the speaker(s). Omissions referred to lexical items present in the source language, which were left out by the interpreter. Substitutions occurred when the interpreter substituted material different from what was uttered by the speaker(s). A linguistic error was also considered to be a medical error if the language made reference to something that was medically related. Medical errors were considered clinically significant if they were likely to impact clinical decision-making and outcomes. Once the error was determined as medically related, the severity of the potential clinical consequences was noted. Five different categories were possible: clinically insignificant, mildly clinically significant, moderately clinically significant, highly clinically significant, and potentially life threatening. An error could fall into any of the four “clinically significant” categories only if it altered the history in any way or had the potential to alter clinical outcomes. The severity of an error related to the level of harm that the misinterpretation could result in. For example, errors pertaining to dosages of medicine were considered highly clinically significant, although not life-threatening, as no medications were actually prescribed. An error in interpreting the age of a relative’s death from breast cancer at the age of 65 instead of 55 would have been considered mildly clinically significant.

Timing and Scoring for Errors

Ten people interpreted, including 6 trained interpreters who had received a standard, 64-hour training in consecutive and simultaneous interpreting—both proximate and remote—and 4 untrained personnel who served as ad hoc “interpreters”. Of the trained interpreters, the years of experience ranged from 0 to 2.75. Of the untrained “interpreters”, the years of experience ranged from 0 to 30. All 10 people interpret in a municipal clinic/hospital setting. Each of the trained interpreters was randomly assigned to 2 of the 3 trained interpreting methods (RSMI, RCMI, PCMI) across different scripts, resulting in 12 interpreted encounters using trained interpreters. No interpreter interpreted for the same script twice or in the same mode twice. Each of the 4 untrained “interpreters” participated once as a proximate ad hoc interpreter, each interpreting a different script. During each encounter, the assigned script was read aloud by an English-speaking physician and by a Spanish-speaking volunteer “patient”. Each physician–patient pair read each of the 4 scripts 4 times. Each time each script was read, it was interpreted by a different modality, resulting in 16 separate encounters. All encounters occurred in clinic-like settings and were audiotaped. A stopwatch was activated at the beginning and conclusion of each script reading. To ensure that the time-related learning curve by the doctor–patient pair would not influence study outcomes, we randomly alternated the order of the four interpreting modalities for each script.

A bilingual English–Spanish medical linguist who was blind to the study aims and design transcribed the 16 interpreted encounters. The transcriptions for each interpretation mode included the provider, patient, and interpreter portions. Two other blinded bilingual translators then checked the transcriptions. There was no indication in the transcriptions as to the interpreting mode or interpreter. All error reviewers underwent a half-day training on coding for errors.

Two of the authors (LO, KP) separately reviewed and scored all transcripts for linguistic and medical errors. A medically trained linguist (JG) was available to assist with any technical questions during linguistic error discussion, reviewed all linguistic decisions, and adjudicated any disagreements. A third bilingual clinician reviewer (FG) with extensive clinical, linguistic, and cultural competence experience participated in the medical error assessment, also to adjudicate disagreements. LO and KP are both bilingual. LO is a native Spanish-speaker and KP received formal education and medical training in Spanish and lived in a Spanish-speaking country. Both are fluent in the ability to speak, understand, read, and write in Spanish and English. JG is a native Spanish-speaker. All the error reviewers were blind to the interpreter and to the interpreting method in the transcripts.

Each transcript was divided into utterances, each of which constituted a phrase. Each utterance was scored using the described error coding methodology. Concordance between the medical error analyzers was 98.2%.

Data Analysis

Summary statistics (mean and standard deviation) were produced to summarize the outcome variables (time, number of linguistic errors, and number of potential medical errors). For the analysis of time, the analysis unit was encounters. For the number of linguistic errors and potential medical errors, we utilized utterances as units of analysis. Medical errors of greater than mild clinical significance were aggregated into a single outcome measure of “potential medical error”. Regression analysis was conducted to evaluate the effect of interpreting mode and identify risk factors associated with the number of medical errors. The units of analysis are not truly independent as they are part of a larger whole (same interpreter, or same script, or utterances that are in close proximity to one another in the script), all of which may influence their association with potential medical error. To handle this complex data structure, a log-linear mixed model was used in the regression analysis. The log-linear mixed model combines the log-linear model with random effects to facilitate analysis of counted data, whereas allowing structured dependence among units of analysis.24 In this model, the number of potential medical errors was the outcome variable; the interpreter was included as a random effect; and interpreting mode (RSMI versus other), script (diabetes, menopause, depression, tuberculosis), complexity of the utterance (more than 1 concept, more than 10 words, moderate/complex), and interpreter-specific variables (completed training, more than 1 year experience) were considered as fixed effects. This model evaluated the effect of interpreting mode, while allowing correlations among the utterances from the same interpreter and adjusting for the effect associated with script, utterance complexity, and training and experience associated with the interpreters. The model fitting consisted of the following 3 steps, and the interpreter was included as a random effect throughout. We first examined the relation between each fixed effect and the outcome of potential medical error to obtain the unadjusted odds ratio. Second, a full model was constructed by including all fixed effects in the model. This allowed us to evaluate the effect of RSMI, in an attempt to correct for the impact that would be attributable to script type, script complexity, and interpreters. Lastly, to identify the important risk factors associated with potential medical errors, a final model was obtained by the backward elimination method, and only those variables that were significant at the 5% level were included in the model.

Results

In terms of interpreting speed, RSMI encounters averaged 12.72 vs 18.24 minutes for the next fastest method of interpreting (proximate ad hoc) (p = 0.002) (Table 1).

Table 1 Time and Linguistic and Medical Errors of Moderate or Greater Clinical Significance, by Interpreting Method

The 16 encounters yielded 1,909 utterances. Of these, 1,185 contained more than 1 meaningful concept. Across the different interpreting modes, RSMI produced far fewer errors than the other modes, which had error rates that were clustered at a significantly higher rate (Table 1). RSMI had a mean of 1.139 (SD = 1.737) linguistic errors per utterance and 0.019 (SD = 0.15) medical errors of moderate or greater clinical significance per utterance (Table 1).

The regression analysis showed that non-RSMI interpreting modalities were associated with a 12-fold greater rate of potential medical errors (of moderate or greater significance) per utterance compared to RSMI (p = 0.0002), after adjusting for script type, interpreter’s experience and training, and utterance complexity. Other factors associated with potential medical errors included the number of concepts per utterance and interpreter experience. Utterances with more than 1 concept were associated with a 2.76 times greater rate of medical errors of moderate or greater clinical significance compared to utterances with only 1 concept. If an interpreter had less than 1 year of experience, she/he made 2.78 times more potential medical errors (of moderate or greater significance) per utterance than an interpreter with more than 1 year of experience (p = 0.0022).

Examples of clinically significant medical errors included the following:

Mild clinical significance:

Doctor: Well, nowadays there are many different treatment options available. We could try using Wellbutrin 150 mg twice a day, and see what happens. This is a very effective drug and I think it would be very helpful. What do you think?

Interpreter: Sí, hoy en día hay diferentes opciones. Podemos tratar esta nueva que se llama Robutrin y coger 150 mg. diarios. ¿Cree usted que podemos tratar esto? (Yes, there are different options nowadays. We can try this new one called Robutrin and get 150 mg daily. Do you think we can try this?)

Moderate clinical significance:

Doctor: You might otherwise get bad wounds without even feeling pain.

Interpreter: De otra forma, usted podría sentirse mejor sin sentir dolor. (Otherwise, you might feel better without feeling pain.)

High clinical significance:

Doctor: I’m going to have to introduce a small speculum to take a look at your cervix and do a new PAP smear.

Interpreter: Ella va a hago un... una instrumento... para chequear... para sacarlos unas cosas de ahí. (She is going to I do an... an instrument... to check... to take them out some things from there.)

Potentially life threatening:

Patient: Sí, pero merzco sentirme así. He cometido muchos errores en mi vida y supongo que tengo que pagar por ello ahora. No merezco vivir más. Quisiera estar muerto. (Yes, but I deserve to feel this way. I have made many mistakes in my life and I suppose that I have to pay for it now. I don’t deserve to live anymore. I wish I were dead.)

Interpreter: Ah yes, but I suppose I ... it’s due for me to feel this way. I’ve committed many...ahh... I’ve committed many bad things in my life.

CONCLUSIONS

We found that scripted encounters were more accurately and quickly interpreted with RSMI than with the more commonly used methods of RCMI, PCMI, and proximate consecutive ad hoc interpreting. RSMI resulted in fewer errors of clinical consequence compared with non-RSMI modes.

Whereas it is easy to understand why RSMI would take less time, given its simultaneous nature, RSMI was also more accurate. Our study does not allow us to understand why this may be the case, but it may be that clinical information is more accurately transmitted because there is minimal time lag. Interpreters do not need to recall large amounts of information. Furthermore, the minimal time allowed to interpret in RSMI may also inhibit editing or advocacy by the interpreters.

RSMI may present other advantages. The time savings afforded by RSMI may allow the clinician to be more patient-centered and address issues beyond the technical aspects of care. Limited previous research on RSMI demonstrates increased physician and patient engagement. Hornberger demonstrated increased physician and patient utterances, questions and explanations, and physician and patient satisfaction when comparing RSMI with PCMI.17

The degree of training that is needed for RSMI interpretation as opposed to consecutive methods is marginally more. Trainings at our institution are 48 hours for introduction to consecutive medical interpreting and 60 hours for introduction to simultaneous medical interpreting.

Our method of determining the clinical significance of potential medical errors was necessarily subjective, and it is not possible from our controlled study to estimate the impact that the coded errors might have had in actual clinical encounters. The high rate of agreement between coders suggests that the method was reliable. However, we believe that our use of standardized dialogues, whereas allowing for fair comparison across modes of interpretation may have resulted in an undercount of potential medical errors. Regardless of the number and severity of the errors in interpretation, the actors adhered to the scripted dialogue. In a real-life interpretation, such errors could have resulted in critical mistakes. Furthermore, there were multiple low-grade errors, which synergistically could produce errors of greater severity and consequence. The scripts were weighted more heavily on diagnostic as opposed to therapeutic aspects of the medical encounter, thereby limiting the potential for assessed clinical consequences. In the patient–physician dialogues, interpreters struggled with the names of medications and with dosages. Medication usage was discussed, but not actually initiated. As a result, there was a tendency to rank medication errors as being of lesser clinical significance than they would be in real-life circumstances, when medications are actually prescribed.

Despite the fact that a standard training and quality management program was used, the results from this study cannot necessarily be generalized to all interpreter services or clinical settings. Our study only examined the performance of 6 interpreters using only 1 non-English language (Spanish). It is possible that outcomes might differ in settings with different staffing, training, languages, or quality management programs. It is also possible that both absolute and relative error rates might be different in real-life, as opposed to the simulated, scripted clinical encounters that we used to “level the playing field” across different interpreting methods.

The United States is home to an increasingly diverse population. Immigrants face multiple barriers to effective health care. Of these, language is key. Efficient, accurate medical interpreting strategies need to be widely disseminated. We found that RSMI provided advantages in terms of speed and accuracy, when compared to other interpreting modalities, and may therefore be a promising option for patients needing language assistance. Future studies are needed, however, to examine the comparative advantages and disadvantages, and the cost-effectiveness of different interpreting modalities across clinical settings and patient populations.