Background

Patient reported outcome measures (PROMs) are a reliable tool for understanding patients’ perceived outcomes, and are typically implemented via telephone or mail follow-up via a centralised registry [1]. Complications are often included in such registries as these data can be analysed to inform best practice and reduce the rates of these complications and their associated economic and social burden [25].

The Arthroplasty Clinical Outcomes Registry, National (ACORN) is an Australian orthopaedic registry collecting clinical and patient-reported outcomes by telephone interview following elective primary and revision total hip arthroplasty (THA) and total knee arthroplasty (TKA) at six months (± one month) post-operatively. The ACORN 2016 annual report reported post-discharge complication rates of 0.055 to 5.4% for THA and 0.0 to 13.0% for TKA [6]. However, to utilise these data to understand complication rates following surgery and influence current practice, the accuracy of the data must be assessed.

Previous studies have compared patient-reported complications (PRC) to clinical examination and medical records in general, inguinal hernia repair, bone marrow transplant, varicose vein, spinal, prostate, gynaecological oncology and orthopaedic surgery [719]. Overall, these studies have indicated high negative predictive values (NPV, 95.0 to 98.2%) but low positive predictive values (PPV, 26.0 to 83.3%) and varying levels of concordance (56.4 to 97.2%) and agreement (11.0 to 100.0%). One study on bone marrow transplant patients showed varying sensitivity (52.9 to 100.0%) and specificity (75.4 to 100.0%) dependent on complication type [13]. Another study monitoring patients’ ability to report surgical site infections reported high sensitivity (83.3%) and specificity (93.7 to 98.1%) [16]. Sensitivity refers to patients’ ability to report the true presence of complications, whilst specificity refers to their ability to report the true absence of complications, when compared to health professionals. PPV and NPV refers to the likelihood of a patient’s positive or negative reporting of complications being correct. Of the three orthopaedic-specific studies, concordance was high for well-defined complications such as deep vein thrombosis and pulmonary embolism, but low for more ambiguous complications such as major bleeding and numbness [1719].

This study aims to assess the reliability of PRC following THA and TKA recorded in ACORN compared to surgeons’ medical records by calculating sensitivity, specificity, PPV, NPV and agreement values.

Methods

Patient population

Patients were included in this study if they had completed a six-month post-operative follow-up interview with ACORN. Patients were included in the ACORN registry if the person was 18 years of age or over, their arthroplasty procedure (primary or revision) of the hip or knee was elective, the surgery was undertaken at a hospital participating in the registry, and the person was not cognitively impaired or unable to consent to participation. Approximately fifty patients who had most recently completed the six-month post-operative follow-up were randomly chosen from each of the six surgeons with the highest volume of procedures captured in ACORN in 2015.

This study was approved by the Hunter New England Human Research Ethics Committee (HREC) as an incorporated sub-study to ACORN. ACORN utilises a verbal opt-out consent process, informed by an HREC-approved written patient information sheet provided to and discussed with each patient at their pre-operative clinical assessment. The consent provides for patient data to be included in the ACORN registry and used for post-operative follow-up of their complications and outcomes via a telephone questionnaire, for the purposes of quality assurance and research [6].

Sample size calculation

Sample size calculation for statistical power to detect a Cohen’s kappa agreement statistic in the range 0.4 to 0.7 with a mean complication prevalence of 10% (range 1.0 to 15.0%) and with a standard α parameter of 0.05, was undertaken using the kappaSize package for R, yielding a minimum sample size of 300 patients [20]. To obtain an even distribution of patients, data from at least 50 patients from each surgeon who had most recently completed their six-month follow-up at April 2016 were acquired. An R script was used to randomly sample from all patients for each surgeon from 2015 from January to October inclusive, and the search was extended backward into 2014 if the patient volume was insufficient for some surgeons.

ACORN six-month follow-up data collection form

The full ACORN six-month follow-up data collection form includes several subjective questions regarding satisfaction, perceived success, EuroQol 5 Dimensions - 5 Level (EQ5D-5L) and EuroQol-Visual Analogue Scale (EQ-VAS) scores, and Oxford Hip or Knee Scores [6]. Complications are also captured as a part of the registry data.

The questions regarding complications were grouped into readmission, reoperation and other complications. Response options to readmission and reoperation were Yes/No/Unstated or unknown. If patients responded Yes for readmission, they were asked about primary reason and hospital of admission. If patients responded Yes for reoperation, they were asked to state the reason for reoperation.

If patients responded Yes to complications not requiring readmission, patients were further asked to specify their complication with an open ended question as to not prompt the patient. Callers then recorded the complications on the standard data collection form which consists of the following; surgical site infection (SSI) requiring oral antibiotics, SSI requiring intravenous antibiotics, deep vein thrombosis (DVT) index leg, DVT other leg, DVT both legs, pulmonary embolism (PE), dislocation, joint stiffness, bladder infection or urinary retention, fracture, unexpected pain, cardiac, stroke, leg length discrepancy, joint or lower limb swelling, paraesthesia or numbness, cellulitis, neuropathy, muscle weakness, respiratory infection, other, and unknown.

Readmission, reoperation and twenty-two separate complications were considered in this study. Twenty of these complications (excluding other and unknown or not stated) were arranged into groups based on similarity, in order to additionally assess validity and agreement within broader categories, as shown in Table 1.

Table 1 Complications grouped by category used for analysis

Data abstraction

Information on post-discharge complications for each patient was abstracted by the lead author from the electronic medical records maintained by each surgeon in their practice. Abstraction was repeated for a subset to confirm the reliability of the data collection. The items abstracted were the same as those collected from each patient at the six-month post-operative follow-up interview using the identical questionnaire used by ACORN. Patients without record of follow-up review by surgeons were excluded and a substitute patient from the randomised list of patients was substituted. Data collection continued until at least 50 patients with follow-up had been collected for each surgeon. Where more time was available, data from additional randomly-chosen patient records were collected from each surgeons’ records.

To avoid possible bias, the investigator abstracting data from surgeons’ private practices was blinded to the results of the six-month follow-up interview for the selected patients from the ACORN database. Following the completion of data abstraction from surgeons’ rooms and recording results into a database, these records were locked and unable to be changed.

Statistical analysis

The data were analysed by calculating the sensitivity, specificity, PPV, NPV, percentage agreement and unweighted Cohen’s kappa coefficient. Cohen’s kappa is a measurement of inter-rater reliability, which adjusts for chance agreement between raters, and is usually interpreted categorically: values less than or equal to zero denote no agreement; 0.01-0.20 slight agreement; 0.21-0.40 fair agreement; 0.41-0.60 moderate agreement; 0.61-.80 substantial agreement, and; 0.81-1.00 almost perfect agreement. The surgeons’ medical records were treated as the gold standard in this study.

To investigate whether additional factors influence these classification and agreement metrics, analyses were also performed with patients categorised by surgeon, the joint operated on, and the time between surgery and follow-up review (which is not always at six months for surgeon follow-up, unlike the ACORN follow-up). The surgeon follow-up times were categorised into <6 weeks, 6–8 weeks, 3–5 months, 6 months and 6–12 months, and were individually compared to the 6-month ACORN data. All analyses were completed using the R statistical computing software environment for statistical computing version 3.3.3 [21, 22] (Additional file 1).

Results

In the random sampling process, 364 patients were selected, of whom 340 had at least one recorded review with a surgeon within six months of surgery. Overall, there were more females than males, and more TKA than THA. No significant between-patient differences in characteristics were observed between surgeons, apart from a systematic difference in follow-up time, which was driven by individual surgeon’s usual practice. Surgeons A, B and E reviewed the bulk of their patients within 8 weeks, whereas surgeons C, D and F reviewed closer to the six-month mark.

Table 2 summarises the characteristics of selected patients and time between follow-up consultations, compared by surgeon.

Table 2 Summary of patient characteristics

A total of 163 complications were reported by 77 patients. The results of the complete analysis are summarised in Table 3.

Table 3 Validity and agreement values for PRC when compared to surgeons’ notes

The proportions of positive agreements (true/true, denoted TT) were low across all complications, with the highest rates observed for readmission and unexpected pain. The highest rates of FT (surgeon/patient) disagreement were in stiffness, lower limb swelling and paresthesia, whilst highest rates of TF (surgeon/patient) disagreement were in unexpected pain. There were high negative agreements (FF) rates throughout all complications due to the low prevalence of most complications. As a result, low values for sensitivities and PPV and high values for specificities and NPV were observed.

With the exception of readmission and reoperation, patients’ sensitivity did not exceed 0.33 when compared to surgeons. Overall sensitivity was 0.14, with nine complications having no result due to the absence of true positives and false negatives, and six complications having zero sensitivity due to an absence of true positives. PPV were similar, with a highest value of 0.38, and 0.13 overall. In contrast, the lowest specificity value was 0.92, and 0.98 overall. NPV was equally high at 0.98 overall and all values were greater than 0.90, with the exception of unexpected pain (0.82).

Twenty-one of 24 complication types showed greater than 90% agreement and agreement was 96.31% overall. In terms of the kappa statistic, two complications were in fair agreement, five in slight agreement and eight showed no agreement. Aside from urinary infection (0.40), values were lower than or equal to 0.28, and an overall kappa value of 0.11 was observed. Patients and surgeons unanimously reported no complications for seven complications, yielding 100% agreement and a kappa of 1.00.

Table 4 provides subgroup analyses by surgeon, joint and follow-up time.

Table 4 Validity and agreement values for PRC when compared with surgeons’ notes, categorised by surgeon, joint and follow-up time

Discussion

Orthopaedic clinical registries are becoming increasingly popular due to their ability to monitor results of surgery in a time- and cost-efficient manner, whilst incorporating the patient’s perspective in the assessment of their surgery. To use this information to influence current practice, however, the accuracy of these data must be assessed.

This study has demonstrated that when patient-reported complication data from a clinical registry is assessed against surgeon notes, they show high specificity, NPV and percentage agreement. These indicate that patients are able to accurately report that they did not experience any complications, as seen in the high true negative results. Since the rates of complications following THA and TKA procedures are low, this study suggests that registries are adequately valid and reliable for assessing complication rates following TKA and THA procedures.

On the other hand, very low sensitivity and PPV were demonstrated, indicating that precise rates of specific complication types may not be adequately estimated from patient-reported data, a finding which is in concordance with previous studies. Three studies on patients identifying surgical site infections and one on hernia repair patients showed that PRC typically showed lower PPV and sensitivity than NPV and specificity [9, 15, 16]. Registries may be a better tool for assessing complications if these values could be improved, but this is challenging when complication rates are low, as small degrees of disagreement can have large effects on calculated sensitivity values and NPV, which in turn can be masked in the specificity and percentage agreement values due to the large number of true negative values (kappa coefficient is addressed in a later section). Nevertheless, if the true negative results were ignored, there were only 21 (7.1%) instances in this study of patients and surgeons agreeing on the presence and type of complications, in comparison to 276 (92.9%) instances of disagreement, out of a total of 297 comparisons. This indicates that, in the presence of complications, patients cannot reliably report occurence of complications.

Where patient and surgeon reports disagreed, patients were more likely to over-report complications in most categories. Of course, high rates of reporting by patients for stiffness, paraesthesia and muscle weakness may be reasonable, if we assume patients are more inconvenienced by minor complications than surgeons often believe. Patients were more likely to under-report leg length discrepancies and superficial infections. Patients with minor leg-length discrepancies may have few symptoms which may explain the under-reporting by patients, whereas surgeons place great importance on leg length due to its possible detrimental outcomes [23]. This study showed poor patient specificity in identifying SSI, a finding that was also observed in a study by Zellmer et al., who suggested that may be improved with validated infection education material [24].

The complications data for ACORN are collected using a yes or no answer. However, complications such as swelling and stiffness are somewhat expected events following surgery as part of a natural healing process. The time frame and degree of debility caused by these complications should be sought, rather than mere presence. Dushey et al. noted similar deficiencies in questionnaires and proposed that quantitative or degree of seriousness criteria should be added when enquiring after the less objective complications [17]. A similar study on general surgery procedures interviewed patients by asking if “an adverse outcome had occurred between discharge and 30 days after discharge”, and found that patients grossly over-reported complications as they would describe their symptoms (e.g. pain and fever) compared to surgeons who observed diagnoses (e.g. infection)[14]. Another study allowed patients to freely describe whether “any complications [arose] as a consequence of [their] operation three months ago?” They critiqued their own methodology, and concluded that clear definitions could improve concordance rates [11].

ACORN callers similarly enquire if the patients have experienced a complication and then ask the patient to specify details, without prompting for specific types of complication. Although this provides a window into patients’ experience and recollection of post-operative complications, it does not provide information on the extent or severity. For example, muscle weakness can refer to slight difficulty in movement to complete immobilisation. Clinical registries such as ACORN may benefit from further enquiring about severity of complications.

This study observed high rates of false negative results for unexpected pain as seen by Visser et al. and Franneby et al. [8, 14]. Joint pain is a major reason for patients undergoing THA and TKA, and hence, it was expected that patients would over-report pain if it continued following their procedures. This may be an incorrect assumption, and the observed results may actually be because the questionnaire refers to this specifically as “unexpected pain”, whilst surgeons (who do not follow a pro forma set of questions) may have noted any pain that the patient reported. This suggests that patients may in fact expect a certain amount of pain following surgery, and added measures of clarifying and quantifying these complications in questionnaires may help improve the accuracy of PRC.

The three existing orthopaedic studies seeking to validate PRC were limited as they only assessed the accuracy of patients who reported complications, and not the accuracy of those who reported no complications. This prevented them from measuring validity values as such as sensitivity and specificity, and they were prone to selection bias due to only investigating the group of patients who reported complications. Two of these studies [17, 19] referenced the same study by Parimi et al. [25], which reported false negative rates of 0.28% in patients reporting simply if they had had a THA, and hypothesised that similar false negative rates may occur with PRC. A strength of our study was that it investigated both groups, and was able to assess the ability of patients to accurately report when they did have a complication, as well as when they did not, and the findings of this study support the assumptions made by Dushey et al. and Greenbaum et al. that false negative rates are low.

Although Cohen’s kappa accounts for chance agreement, the literature has noted that the assumptions made about rater independence may overestimate chance agreement, thereby underestimating the agreement value [26, 27]. The implication for his study is that because of the low incidence of many of the complications, the kappa statistics are not robust to very small changes in agreement.

Patients followed up outside the six-month mark were not excluded because in clinical practice not all patients are reviewed at the same time point by surgeons. This was addressed by including subgroup analyses in which patients reviewed at six months post-surgery can be compared to those followed up at different times, which showed no significant differences. Further, this study accepted the surgeons’ records as gold standard as it reflects the real-life surgeon awareness of patient complications. Studies have discussed that although surgeons may have a better idea about what constitutes “true” medical and surgical complications, only the patients have the complete picture of adverse events [8, 11, 28]. Surgeons may also be susceptible to overlooking minor complications and keeping incomplete or inaccurate records, and alternative care sought from other health services for complications will not have been captured [14]. However, using patient recollection as the gold standard has its own set of problems, as it may be subject to recall bias. Both sources have limitations and this study suggests that agreement values (percentage agreement) may be more appropriate in assessing the accuracy of PRC.

Conclusion

Accurate but efficient ascertainment of complication rates following surgery remains a highly important aspect of not only surgeon appraisal, but also of patient satisfaction and continuing improvements in medical care. The high concordance for true negative results along with high specificity, NPV and percentage agreement found in this study are encouraging, as it indicates that complication rates following THA and TKA are low, and PRC are accurate in this regard. However, the low sensitivity and PPV must be improved, and we suggest that improved wording and clarity of questionnaires used by registries to elicit these data from patients would aid in achieving this.