Interpreter proxy versus healthcare interpreter for administration of patient surveys following arthroplasty: a pilot study
- 127 Downloads
Clinical quality registries and other systems that conduct routine post-discharge surveillance of patient outcomes following surgery may have difficulty surveying patients who have limited proficiency in the language of the healthcare provider. Interpreter proxies (family and carers) are often used due to limited access to certified healthcare interpreters (due to cost or availability). The aim of this study was to assess the reliability of engaging interpreter proxies compared with certified healthcare interpreters for the administration of patient-reported health-related surveys for people with limited English proficiency (LEP).
People with LEP and due for a routine 6-month telephone follow-up post knee or hip arthroplasty were invited to participate. Participants were randomly allocated to having their first interview with an interpreter proxy or a certified healthcare interpreter followed by the second (crossover) interview within 2 weeks (range: 4 to 12 days) after the first interview using the alternative method. Agreement between the two methods was assessed using quadratic weighted Cohen’s kappa, intraclass correlation and concordance correlation co-efficient where appropriate for EQ-5D health domains, total Oxford hip and knee scores, patient satisfaction, operation success, readmission, reoperation, and post-surgical complication responses. The mean of the differences between the same data items collected by each of the two methods was also calculated.
Eighty five participants (96%) completed the study. There was substantial to excellent inter-rater agreement (kappa = 0.69–0.87 and ICCs above 0.74) for all but one measure. The mean differences between family proxy and healthcare interpreter scores for each participant were small, ranging from 0.01 (score range of 1–5) to 0.72 (score range of 0–100).
These results suggest that using interpreter proxies is a reliable alternative to certified healthcare interpreters in conducting patient-reported health surveys, potentially making this process easier and cost effective for researchers and registries.
KeywordsOsteoarthritis Arthroplasty Clinical quality registry Interpreter Reliability
Arthroplasty Clinical Outcomes Registry National
Concordance correlation coefficient
Intraclass correlation coefficient
Quadratic Weighted Cohen’s Kappa
Limited English proficiency
Oxford Hip Score
Oxford Knee Score
Patient-reported outcome measures
Hip and knee arthroplasties are common surgical procedures and patients are commonly surveyed post-operatively to ascertain their health outcomes after treatment. A major challenge to the collection of such data is inclusion of people with limited or no language proficiency in the language of the healthcare provider or the surveyor. For those with limited English proficiency (LEP), for example, their satisfaction with healthcare may be lower, and they may have a greater risk of serious medical events . In an Australian study, non-English speaking status was a predictor of lower postoperative International Knee Society scores and more severe self-reported pain following knee arthroplasty . These observations suggest it is imperative that LEP patients have their health outcomes included in surveillance programs if the outcomes are to reflect the entire population.
For many outcome measurement programs, patients are surveyed by telephone. The costs of professional interpreter services  and logistical difficulties of finding multi-lingual staff to administer questionnaires in the patients’ native languages means that family or carers (who speak the requisite language) are often used as interpreters (termed as interpreter proxies). However, the reliability of using interpreter proxies in this setting has not been compared to the use of professional interpreters.
The aim of the study is to determine whether, in patients with LEP, the use of interpreter proxies provides adequately reliable survey results compared to using certified healthcare interpreters, which are considered the ‘gold standard’.
We used a randomised crossover study design to compare survey outcomes between two groups: surveys conducted using interpreter proxies (family members and carers), and those conducted using certified healthcare interpreters.
The study setting was within the Arthroplasty Clinical Outcomes Registry National (ACORN), a clinical quality registry that collects health data on patients undergoing elective hip or knee arthroplasty surgery in multiple hospitals. Post-operative data collection is conducted 6 months post-surgery by telephone and approximately 12% of participants have LEP.
The participants included in this study were ACORN patients who were due for their 6-month follow-up call between March and September 2015. Inclusion criteria were: having identified themselves as requiring an interpreter and/or having LEP preoperatively, cognitive capacity to understand the follow-up questions, fluency in a language for which a healthcare interpreter is available from the South West Sydney Interpreter Services, and also having access to a family or friend proxy who is able to interpret between English and their desired language.
English proficiency was ascertained in the pre-admissions clinic through a sensitive set of questions . Patients were asked how well do you speak English? Those who answered very well were deemed English proficient while those who answered well or not well were asked a second question: In what language do you prefer your medical care? Those nominating ‘English’ as their preferred language, were classified as English-proficient. Those nominating another language or who were unable to answer the first question were classified as having LEP and were included in the screening process for this study.
Patients with LEP who met the above criteria were sequentially ordered according to when their interviews were due and were contacted by telephone. If patients provided verbal consent with the assistance of the healthcare interpreter or interpreter proxy, they were randomly allocated to having their first interview via an interpreter proxy or via a certified healthcare interpreter. The randomisation was carried out according to a computer-generated sequence prepared before the commencement of the study and concealed in sequentially numbered envelopes containing allocation details. The envelopes were opened immediately after the patient provided consent.
The first interview was conducted within 1 month either side of the 6 month post-operative date as per routine follow up. We considered the condition to be stable at 6 months. This was followed by the second interview within 2 weeks of the first interview using the alternative method. The interview questions were asked in English by the research officer and questions and responses were translated to and from the appropriate language by the interpreter proxy or certified healthcare interpreter. Interviews with certified interpreters were made with the assistance of the call centre manager at the Sydney South West Local Health District Language Services who connected the research officer, interpreter and patient in a 3-way conference call. Interviews with interpreter proxies were performed in a 2-way telephone call with the research officer on one end and the proxy translating the interview questions and responses to and from the patient on the other end.
Satisfaction: “How would you describe the results of your operation” (a 5-point Likert scale - ‘excellent’-1, ‘very good’-2, ‘good’-3, ‘fair’-4, ‘poor’-5).
Success: “Overall, how are the problems with your knee/hip now compared to before your operation” (a 5-point Likert scale - ‘much better’-1, ‘a little better’-2, ‘about the same’-3, ‘a little worse’-4, and ‘much worse’-5).
Complications: “Have you experienced any complications after the operation since being discharged from hospital”; a standard list of common complications was read out.
Readmission: “Were you admitted to hospital again since leaving hospital after the knee/hip replacement?” answered as yes or no.
Reoperation: “Have you had another operation on the same joint that was operated on?” answered as yes or no.
Patient-reported health status using the EuroQoL EQ-5D-5 L and EQ-VAS questionnaires  (English version): The EQ-5D-5 L rates the patient’s mobility, personal care, usual activities, pain/discomfort and anxiety/depression levels in separate 5-point Likert scales, in which for each category a score of ‘1’ represents the best outcome and a score of ‘5’ the worst. The EQ-VAS rates the patient’s overall health along a visual scale from zero to 100 where zero refers to the worst health and 100 the best health. The English version of the questionnaires were used and read out by the interpreter and interpreter proxies in the patient’s desired language.
Joint-specific patient-reported pain and function was assessed using the Oxford Hip Score (OHS) and Oxford Knee Score (OKS) (English version). This is a 12-question survey using a Likert scale (0–4) which asks about the patient’s perceived difficulty or pain with performing everyday movements and tasks. The summary score minimum is 0 and the maximum score of 48 denotes the best outcome [6, 7]. English versions of the Oxford scores were used and read out by the interpreters and interpreter proxies in the patient’s desired language.
Two extra questions were asked to determine the number of years the patient and interpreter proxy had spent living in Australia: “in what year did you (and interpreter proxy) first arrive in Australia to live here for one year or more?” These questions correspond with those asked in the 2011 Australian Census of Population and Housing .
Our convenience sample exceeded the minimum sample size of 50 patients required to detect an ICC of 0.50 with 90% power and 5% significance. Ordinal data were analysed using quadratic weighted Cohen’s kappa coefficients, intraclass coefficients (ICC) measuring absolute agreement and Lin’s concordance correlation coefficient (CCC) for each outcome measure to assess the magnitude of agreement between interpreter proxy and healthcare interpreter. In addition, a Wilcoxon paired ranked sum test was performed on each measure to assess the statistical significance of the differences obtained between the two methods of interview administration. Nominal data measures were analysed with an unweighted Cohen’s kappa coefficient. The EQ-VAS was treated as continuous data and was analysed using ICC and CCC.
The differences in scores from the two methods were also determined for the EQ-5D, EQ-VAS, and Oxford scores and visualised through Bland-Altman plots . From this a mean was calculated to evaluate for potential bias. The degree of bias is revealed by the mean differences and 95% Limits of Agreement plotted to indicate the range where 95% of the differences lie. All data analysis was performed using R open-source statistical software version 3.2 . Figures were generated using R Studio version 0.99.
The outcomes assessed were the levels of agreement between the two methods of language interpreting as determined by Cohen’s kappa coefficients, Intraclass correlation coefficient (ICC) and Concordance correlation coefficient (CCC) statistics where appropriate. The Cohen’s kappa coefficients were interpreted in accordance with guidelines put forward by Landis and Koch . Coefficients between 0.21 and 0.40 were considered to show fair agreement, scores between 0.41 and 0.60 moderate agreement, scores between 0.61 and 0.80 substantial agreement, and scores above 0.80 almost perfect agreement.
Demographic profile of the study population
Int Proxy First (n=44)
HC Int First (n=41)
Sex, n (%)
Age, mean (SD)
Patient years in Australia, mean (SD)
Proxy years in Australia, mean (SD)
Surgery Type, n (%)
Languages utilised in this study
Mean years in Australia
Agreement scores of patient reported outcome measures
ICC (95% CI)
CCC (95% CI)
95% Limits of Agreement
0.81 (0.73 - 0.87)
-0.93 to 0.88
0.66 (0.52 - 0.76)
-1.11 to 1.15
-0.99 to 1.2
-1.07 to 1.02
-1.05 to 1.16
0.78 (0.68 - 0.85)
-22.7 to 24.1
-6.46 to 6.72
Mean differences stratified by interview order
95% Limits of Agreement
-0.87 to 0.91
HC Interpreter first
-0.99 to 0.85
-1.25 to 1.21
HC Interpreter first
-0.94 to 1.09
-1.17 to 1.40
HC Interpreter first
-0.76 to 0.95
-1.22 to 0.95
HC Interpreter first
-0.86 to 1.06
-1.16 to 1.35
HC Interpreter first
-0.90 to 0.95
-25.20 to 32.60
HC Interpreter first
-15.70 to 10.60
-7.57 to 8.84
HC Interpreter first
-4.48 to 3.65
The high level of agreement overall between interpreter proxies and healthcare interpreters found in this study suggests that utilising carers or family members to interpret is adequately reliable for capture of patient-reported outcomes after arthroplasty.
Healthcare interpreters have been noted to deliver a higher standard of interpreting quality with fewer translation errors when compared with non-certified ad hoc interpreters . As such, the healthcare interpreters should be adept at clarifying any confusion over the meaning of certain words in the questionnaire and also clearly relay how the patient’s recovery profile is represented by the responses given.
The perfect agreement (kappa = 1) between the responses regarding readmission and reoperation was expected assuming that the question was accurately asked by both types of interpreters as these are significant events for the patient. Agreement for determining complication was lower (kappa = 0.69), a discrepancy which may have been due to uncertainty over the exact nature of a complication, which was defined as requiring active management but not readmission or reoperation.
The results showed that patients who had a hip arthroplasty recorded higher levels of agreement for all outcomes apart from complication, despite the only differences in questioning being the three joint-specific items in the Oxford questionnaires. This may have been due to the fact that patients who undergo knee operations experience greater day-to-day variability in their pain and function levels (possibly even dependent on weather conditions), which would account for the lower agreement levels in the EQ-5D which assesses daily situation compared with the Oxford scores which assess the most recent 4 week period.
Similar 95% limits of agreement have also been seen in test-retest studies of reliability indicating that the differences seen between methods in this study may be explained by the normal week-to-week variation of responses to the survey questions, which may also incorporate true health changes between surveys . While patient recovery following arthroplasty largely plateaus after 6 months, the 1 month window either side of the 6 month date which we allowed for conducting interviews may have confounded the test-retest reliability.
However, once the entire sample population is considered, the overall bias or the mean difference on the Bland-Altman plots is consistently very small. With results stratified to interview order, there was no statistically significant difference in the mean scores except for marginal differences in the EQ-VAS score. This indicates that patients were not continuing to improve in the time between interviews. This suggests that the methods are similar when comparing groups of patients but that differences are seen at the level of the individual patient. However, as stated above, we consider this to be reflective of the reliability of the survey questions rather than due to the method of data collection.
A limitation of the study was the inability to assess the effect that individual languages may have on the level of agreement. This was due to an insufficient sample size and the large number of languages, which did not allow analysis of individual languages. While languages could potentially be grouped into geographical regions of origin, these classifications do not necessarily reflect cultural diversity and the effects this may have on patient reported outcomes. The study was limited to patients who had undergone arthroplasty surgery and these findings may not be generalisable to other settings, particularly were socially sensitive topics may be discussed which may limit the accuracy of proxy interpreters.
Another limitation is that we used English versions of the validated surveys and not those specific to the language of the patient being interviewed. Due to this, variation in the linguistic skills of proxy interpreters may have affected the accuracy of results. This approach, necessitated by a lack of availability of all required languages for each survey at the time of the study, reflected the pragmatic approach used by this registry that removes the need to restrict participation based on survey availability.
The benefits of professional interpreters to the patient in a clinical setting are well recognised, with the literature agreeing that patient satisfaction and quality of care is greatest when hospital-trained interpreters and telephone interpreters are utilised [14, 15, 16]. However, the inconvenience and cost of using healthcare interpreters is a barrier to participation in data collection.
These results suggest that using interpreter proxies is a reliable, efficient and likely cost-effective alternative to using healthcare interpreters when conducting telephone surveys of patient-reported outcomes after health interventions. Despite differences seen at the individual level, when the entire cohort is considered, there is an insignificant difference between the two methods of interview.
The authors wish to thank Katina Varelis, Nitin Sharma, Zareena Azimullah and the interpreter service of the Sydney South West Local Health District Language Services for organizing interpreter calls throughout the project.
EA, JMN and IAH conceived and designed the study. EA submitted the study protocol for ethics approval. DX, TC, EA and RM contributed to data collection. DX and TC performed the data analysis with input from JMN and IAH. DX, TC and IAH prepared the manuscript. All authors read, reviewed and approved the manuscript.
Funding for the engagement of certified healthcare interpreters of Sydney South West Local Health District Language Services was provided by internal institutional funds of the Whitlam Orthopaedic Research Centre. No additional funding was provided for the design of the study, collection, analysis, and interpretation of data and in writing the manuscript for this study.
Ethics approval and consent to participate
The study was approved by Hunter New England Human Research Ethics Committee and all procedures performed were in accordance with the ethical standards of the 1964 Helsinki declaration.
Informed consent was obtained verbally from all individuals included in the study with the assistance of certified interpreters or carers and family members. A verbal consent form approved by Hunter New England Human Research Ethics Committee was read out and completed for each participant enrolled in the study.
Consent for publication
The authors declare that they have no competing interests.
- 8.Australian Bureau of Statistics. 2011 Census Household Form – text version 2013 [Available from: http://www.abs.gov.au/websitedbs/censushome.nsf/home/2011hhftranscript%5C. Accessed 16 Oct 2015.
- 10.R Core Team. R: A Language and Environment for Statistical Computing. Vienna, Austria 2015.Google Scholar
- 14.Garcia EA, Roy LC, Okada PJ, Perkins SD, Wiebe RA. A comparison of the influence of hospital-trained, ad hoc, and telephone interpreters on perceived satisfaction of limited English-proficient parents presenting to a pediatric emergency department. Pediatr Emerg Care. 2004;20(6):373–8.CrossRefGoogle Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.