Introduction

In clinical research, it is crucial to question how true and accurate data are; however, data validity and accuracy assessments are rarely published explicitly. National medical registries collect large-scale data during the dynamic workflow of daily clinical practice and have become essential sources of evidence-based medicine and health care policies. Register-based studies reflect everyday practice and have high external validity, and complement randomized control trials (RCTs) that assess smaller populations with lower external validity. Register data are collected and recorded by healthcare personnel, and not by dedicated research assistants. Therefore, it is essential to periodically assess the quality of register data reported by healthcare personnel and patients by validating it against other sources of data [1,2,3]. Because systematic errors can lead to bias, register validations may impact the robustness of medical and political conclusions based on register data. The literature on the validity of medical register data is sparse. Some studies are reporting good validity of medical and cancer registries [4,5,6]. However, a recent validation study of a German spine registry (DWG) showed high inaccuracy [7] and the authors recommended against using these register data.

Our study aimed to assess the accuracy and agreement of NORspine data by comparing it to electronic patient records (EPR). Such information can aid in identifying pitfalls and conceptual problems related to data collection, not only relevant for other spine registers but also others, routinely recording clinical data.

Patients and methods

In this cross-sectional study, we reviewed electronic patient records (EPRs) of patients operated for lumbar spinal stenosis (LSS) who consented and responded to NORspine between January 1, 2015, and December 31, 2016. The authors were authorized to access data from four public hospitals within one health region (South-Eastern Norway Regional Health Authority) in Norway. To assess the representativity of our sample, we compared the study population to those treated at the remaining hospitals.

In Norway, all 39 hospitals (coverage = 100%) that offer surgery for degenerative spinal disorders are obliged to report data to NORspine. Seventy percent of all patients that undergo elective spine surgery in Norway are included in NORspine, and the proportion that responds one year after surgery is seventy-four percent [8].

A NORspine data set consists of a preoperative form completed by the patient at admission for surgery. This form covers items related to sociodemographic and lifestyle variables (e.g., smoking, height, and weight) and a standard battery of questionnaires assessing pain and disability (Table 5). Immediately after completing surgery, and optimally while still in the operating theater, the surgeon completes a standardized form and reports clinical and radiological diagnosis, relevant comorbidities, ASA classification—usually as graded by the anesthetist, and details about the surgery, e.g., previous surgery, surgical access, surgical methods, and level(s) operated. The surgeon also reports perioperative complications by a predefined list (Table 6).

Patients report the clinical outcome at 3 and 12 months after surgery as assessed by standard Patient-Reported Outcome Measures (PROMs).

Electronic patient records (EPRs) consist of non-structured text documents (free text) recorded by DIPS® software within predetermined headings. We reviewed the EPRs using a standard empty NORspine form, and the investigators (OKA and SK) had no access to the corresponding data previously recorded in the NORspine. The study group selected a set of NORspine variables that could be recaptured from EPRs. Furthermore, we reviewed EPR documents (e.g., admission and surgeon’s notes) at the same time point as the time of surgery recorded in NORspine. We did not assess variables that were not registered routinely or consistently in EPRs, such as PROMs, symptom duration, marital status, education level, mother tongue, and working capability. The clinical follow-up at the treating centers was not standardized, and it was performed at different time points at the hospitals without structural recording in EPR. Hence, follow-up data (including reoperations) in NORspine were not evaluated against EPRs in this study.

The EPRs of 22 patients were independently reviewed by two raters (OA and SK) to estimate interobserver reliability.

We calculated concordance in terms of agreement when comparing the structured NORspine data with EPR data; we also calculated accuracy for dichotomous variables, using EPR as the gold standard. We chose to report both accuracy and agreement because the use of certain EPR variables as a reference could be questioned (e.g., smoking and comorbidity).

The NORspine form requires the surgeon to report relevant comorbidities from a list, such as cardiovascular disease, diabetes, and osteoarthritis. In the EPR, comorbidity is recorded irrespective of its relevance to the planned spinal surgery. Consequently, agreement and accuracy were not evaluated for comorbidities. We only compared frequencies of relevant comorbidities recorded in NORspine vs. the corresponding comorbidities recorded in EPRs. Furthermore, we assessed the agreement for ASA classification between the two data sources.

Statistical analyses

Baseline data were described using means (95%CI) (continuous data) and proportions (categorical data). Accuracy was assessed by proportion correctly classified (PCC) and sensitivity. Perioperative complications were categorized by eight categories (Table 6), and the accuracy of complication recording was assessed by class average accuracy (CAA) using the micro-averaged method. Agreement between NORspine and EPRs was assessed by Cohen's kappa (ƙ) or Fleiss weighted kappa (ƙ) for categorical variables (dichotomous and ordinal variables). (ASA classification was analyzed as an ordinal variable, ranging from 1 to 5, in the agreement analysis.) For continuous variables, we calculated the intraclass correlation coefficient (ICC) using a two-way mixed model to assess absolute agreement [9]. We classified agreement (ƙ-value) as minimal (0.21–0.39), weak (0.40–0.59), moderate (0.60–0.79), strong (0.80–0.90), and almost perfect (> 0.90) [10]. Agreement according to ICC (values) was classified as poor (< 0.50), moderate (0.50–0.75), strong (0.75–0.90), and excellent (< 0.90) [11]. Finally, we calculated the prevalence of missing values for each variable. The results are presented as point estimates with 95% confidence intervals (CI).

We used SPSS, version 26 (IBM Corp., Armonk, N.Y., USA) and STATA version 16 (StataCorp. 2019. Stata Statistical Software: Release 16. College Station, TX: StataCorp LLC.)

Ethical considerations

The Norwegian Regional Committee for medical and health research ethics approved this study (reference no. 2017(2157)), as did the data protection officers at the four hospitals. All patients had provided informed consent, and the study was conducted in compliance with the Helsinki declaration.

Results

NORspine recorded 3,843 patients operated for LSS during 2015 and 2016. The investigators were authorized to access EPRs at four hospitals and reviewed the EPRs of 474 consecutive operated patients (12.3% of the NORspine population). Mean age (95%CI) was 66 (65.3–67.2) years, and 254 (54%) were females. The total of missing data were 0.9% in NORspine (completeness 99.1%) and 2.8% (completeness 97.2%) in EPRs (Table 7).

Patient characteristics, including data on the rest of the NORspine patients operated for lumbar spinal stenosis, are shown in Table 1. Our sample differed somewhat from the rest of the NORspine population at baseline. The included patients had more comorbidity, higher BMI, and higher disability (ODI) and pain scores (NRS = numeric rating scales) for leg and back pain. In addition, the study population had more smokers and had fewer perioperative complications than the total spinal stenosis population registered in NORspine (Table 1). For a sample of 22 patients, the interrater reliability for the two authors that reviewed EPR variables was almost perfect.

Table 1 Patient characteristics and perioperative details of 474 NORspine patients operated for spinal stenosis at four hospitals compared to 3369 from the remaining hospitals

Perioperative complications were recorded for 15 (3.2%) patients in NORspine, and 30 (6.5%) patients in the EPRs. The agreement between NORspine and EPR was weak (ƙ (95%CI) = 0.51 (0.33–0.69)). The class average accuracy for all perioperative complications was 99.4% (eight different categories combined), and for dural tears isolated, 97.0% were classified correctly (PCC). The sensitivity for recording a complication (95%CI) was 40% (23–58%) (Table 2).

Table 2 Accuracy and agreement of NORspine data for 474 spinal stenosis patients compared to their electronic patient records

As shown in Table 3, ASA classification (1–5) showed moderate agreement (ƙ (95%CI) = 0.73 (0.66–0.80)). Table 4 shows the differences in the prevalence of comorbidities. NORspine underreported comorbidities compared to EPRs.

Table 3 Agreement for NORspine data for 474 spinal stenosis patients compared to their electronic patient records, ordinal or continuous variables
Table 4 Prevalence of relevant comorbidities reported by NORspine compared to relevant comorbidities reported in EPRs for 474 patients operated for LSS

As shown in Table 2, previous surgery (yes or no) had an almost perfect agreement (ƙ (95%CI) = 0.93 (0.89–0.97)), a proportion classified correctly of 97.2%, and a sensitivity of 95.8%. The number of previous surgeries showed moderate agreement (ƙ (95%CI) = 0.62 (0.48–0.75)), as shown in Table 3.

Perioperative details (method of decompression, fusion, surgical access, spinal level operated) recorded by the surgeon showed moderate to excellent agreement between NORspine and EPR (ƙ = 0.76 to 0.98), and high proportions were classified correctly (93–99%). The sensitivity for the recording of perioperative details was high (92–99%).

Smoking status had an almost perfect agreement (ƙ (95%CI) = 0.93 (0.89–0.97)), a proportion correctly classified of 97.2%, and a sensitivity of 92.0%. Furthermore, as shown in Table 3, the patients' height, weight, and BMI showed excellent agreement between NORspine and EPRs (ICC = 0.99 to 0.99).

Discussion

This cross-sectional study compared Norwegian spine registry (NORspine) data to corresponding EPR data. We found a weak agreement for perioperative complications, a moderate agreement for ASA classification, a moderate to strong agreement for perioperative details, and almost perfect agreement for demographics. NORspine underreported perioperative complications and comorbidity.

Perioperative complications had a weak agreement and were underreported (sensitivity of only 40%) in NORspine. For example, dural tears were recorded in 13 patients (2.7%) in NORspine and 25 patients (5.3%) in EPR. Physicians' underreporting of surgical complications has been previously reported [12,13,14,15,16,17]. In line with our findings, a Swedish study of medical registers by Øhrn et al. from 2011 showed that only 74 of 210 (35%) of complications registered in a patient claim database had been recorded in the Swedish spine register [18]. Furthermore, a study validating German spine register data found wrong entries ranging from 10 to 50% for variables describing complications and reoperations [7]. Still, a sensitivity of 40% for surgeon-reported perioperative complications in the present study was unexpectedly low. We found a class average accuracy (CAA) for all perioperative complications of 99.4%; however, some of the complications listed are extremely rare, and CAA may, therefore, overestimate the accuracy of complication reporting. Previously published data on the prevalence of perioperative complications range between 3 and 16% [19,20,21,22]. The corresponding number in NORspine was 3.2%, also indicating an underreporting. EPRs documented 6.5% perioperative complications – a number more concordant with previous studies. Perioperative complications are recorded in NORspine and EPR at the same time point, and these data sources should match. Possible explanations for the discrepancy between the frequencies of complications recorded in NORspine and EPRs can be different definitions; for example, a minor repaired dural tear may not be graded as a complication by some surgeons.

ASA classification showed a moderate agreement, and the means between the two data sources were similar (2.17 vs. 2.14), illustrating no tendency to either under- or over-classification. The German spine register validation study reported wrong entries for ASA classification in 25% of cases and showed that a relatively simple classification system might be reported inaccurately [7]. However, all classification systems are subject to interpretation and inherent disagreement. We considered the ASA classification recorded in EPRs by anesthetists as the gold standard. However, the surgeon completing the NORspine form could either miss or disagree with the ASA classification provided by the anesthetist or use an ASA score recorded elsewhere in the EPR.

Each comorbidity was underreported in NORspine; this may be because surgeons could have different definitions of comorbidity they considered relevant, which illustrates a problem with the concept validity of this item in the NORspine questionnaire. Carreon et al. studied the comorbidity in patients with spinal stenosis in 2003 [21]. They found prevalence on the same level as we did in EPR, which supports our conclusion that comorbidity was underreported in NORspine. Moreover, previous studies have found low accuracy for orthopedic surgeons performing coding of diagnoses and indications for surgery, assessing cognitive function, and registering antibiotic use [23,24,25]. The discrepancy in the recorded prevalence of depression and anxiety in NORspine vs. EPRs may indicate that spine surgeons are not sufficiently aware of patients’ mental health and how mental health may influence the clinical results (PROMs) after spinal surgery.

One should consider alternative ways of assessing comorbidity. However, other comorbidity scoring systems as frailty score and comorbidity indices (Charlson comorbidity index (CCI) and Elixhauser comorbidity index) [26, 27] are more complex, possibly affecting response rates and accuracy. We found ASA classification to be the most feasible comorbidity measure, and it displayed moderate agreement in our study. Mannion et al. found that ASA was a strong predictor of complications after hip surgery, and adding a more complex score (CCI) was not superior in predicting postoperative complications [28]. Hence, we recommend using ASA classification over more complex measures despite its limitations.

There was a discrepancy in accuracy between the different variables concerning previous surgery. Previous spinal surgery (yes/no) had an agreement of 0.93, and the number of previous surgeries had an agreement of 0.62; this indicates that NORspine is more precise in recording patients who had any previous surgery than the exact number of previous surgeries.

Perioperative details were accurately registered, with the proportion correctly classified above 93%. There was a strong to excellent agreement between NORspine data and the EPR data, with kappa values above 0.90; this is also in line with the literature; orthopedic surgeons coded surgical procedures and classified x-rays accurately in previous studies [23, 24]. However, surgical access reported by the surgeon showed minimal agreement between NORspine and EPR. Defining surgical accesses in NORspine may have been subject to interpretation, as surgeons may have misinterpreted the “lateral/Wiltzes’” choice as the direct lateral approach. Therefore, the NORspine board plans to clarify and amend options for surgical accesses in the next version of the surgeon-reported questionnaire.

Smoking status is recorded in the EPR as a direct question to the patient and in the NORspine as a simple yes or no question. The source of these two variables was the same, the patient. However, there was an error rate of 2,5% (PCC 97,5%) and an agreement of 0.93. This variable can indicate the rate of random error in NORspine. Patients’ height, weight, and BMI displayed excellent agreement. The patients themselves report these variables to NORspine, and their accuracy and agreement could serve as an aim for surgeon-recorded variables. It is questionable to define EPR as a gold standard because some variables could be more correctly reported by patients than healthcare personnel. A further step to improve data quality could be to use a combined construct of patient- and physician-recorded variables [4].

About 1% of NORspine data values were missing values, as compared to 3% in the EPR. This is in line with a literature review of data quality from 2002 [29]; they found 2% missing data in automatically collected and 5% in manually collected register data.

Our study has several limitations: We used EPRs as an external data source, although they may lack relevant information. EPR data might not be appropriate for some variables as a reference, so we chose to report both accuracy and agreement. Agreement would be a more appropriate measure when no clear reference standard exists. The EPRs at the four hospitals were not standardized (free text format) and could miss or misinterpret relevant information. On the other hand, every patient has an EPR, and it has been defined as a gold standard in other validation studies [4,5,6,7] and has a high medical and legal status. Ideally, to be defined as a complete gold standard, the EPR should record PROMs.

Another limitation was potential selection bias due to the non-randomized selection of hospitals. The accuracy of NORspine and EPR data registration could differ between hospitals, limiting the generalizability of our findings. However, most of the differences in patient characteristics between the four selected and the remaining hospitals reporting to NORspine were small, and some of them might be incidental findings. Therefore, the authors consider the patient sample representative for the broader population of the NORspine. Patients analyzed in the present study were operated on and included during 2015–2016, and no relevant changes have been made in NORspine since 2015. Therefore, we believe that our findings are still relevant.

The selection of variables had to be limited to those available and suitable for comparison in both data sources. Therefore, the concordance of some relevant variables could not be assessed (e.g., patient-reported disability and pain).

We only assessed patients who underwent decompression due to spinal stenosis, who were treated with a limited number of simple procedures and surgical accesses. Our results may, therefore, represent a “best-case scenario” regarding the quality of NORspine data.

The strength of this study was a comprehensive and systematic review of a large number of EPRs at four hospitals. We assessed both accuracy (PCC and sensitivity) and agreement (kappa or ICC) of patient—and surgeon-reported data to validate different NORspine variables.

Future perspectives and implications

A long-term goal could be the inclusion of clinical registry data in a structured EPR. Structured EPRs have been implemented in Norway for hip fracture patients, and data from a structured EPR are sent directly to the national hip fracture audit. Structured EPRs can improve the quality of the EPR and the quality and completeness of registry data. Furthermore, structured EPRs could make valuable data more accessible to clinical research. A future perspective would be to integrate spine registers into a structured EPR.

Conclusions

This cross-sectional validation study showed that the Norwegian Registry for Spine Surgery (NORspine) tended to underreport perioperative complications to spine surgery compared to corresponding EPRs. This finding may represent a systematic error (information bias), and future register studies on complications after spinal surgery could cross-reference perioperative complications with other data sources to reduce the risk of underreporting. Comorbidities were also underreported in NORspine; the ASA classification seems the simplest and most reliable way to assess comorbidity. Perioperative details and patient-reported data had moderate to excellent agreement.