Introduction

Registries were designed to acquire comprehensive data on treatment effects and natural course of diseases [1]. Particularly in spine surgery, registries have been adopted very early. The first structured and large-scale spine registry was the Swedish Spine Registry; Spine Tango was the first international [2, 3]. Thus, we look back on more than two decades of experience with data originating from registries, which are increasingly used for clinical science and quality management.

Randomized controlled trials (RCTs) enable us to gain level I evidence for treatment strategies but have the disadvantage of well-defined subgroups as their target of research and a standardized environment that does not reflect clinical reality.

Registries have the hypothetical advantage to acquire data from a complete population with a specific disease treated in a real-world setting. Thus, clinical studies with data from registries are considered to ideally complementing RCTs or vice versa.

Data originating from clinical registries become also increasingly important today for quality management purposes to all stakeholders of our healthcare systems, e.g., health insurances, healthcare providers, politicians, professionals and their respective scientific societies. For example, new legislation in Europe requires that medical products such as spinal implants can be individually followed and are correlated with clinical data. Only registries can fulfill these requirements. It is further to be expected or already in place that decisions regarding reimbursement of treatment or even sanctions against hospitals and/or surgeons are derived from comparative registry data.

Because these two purposes are highly important and may have far reaching consequences, data quality of registries itself with regard to completeness and accuracy becomes of utmost importance.

In strong contrast to this stands the fact that few spine registries have external monitoring, which would be a first prerequisite to assure data quality.

Another obstacle for almost all registries with exception of the Scandinavian ones is that most national data protection laws require strictly anonymized data entries, which prohibits any relevant and accurate follow-up for very important markers such as reoperation rates or 30-day readmissions.

Thus, the aim of this study was to investigate the accuracy and completeness of data from registries as they are today via the example of the German Spine Society Registry. Since we are convinced that the findings can be extrapolated to most other registries, we consider our message important beyond the spine community.

Methods

Ethics approval and consent to participate

The local ethics committee of our university confirmed the study (registration number 42/16). All patients gave written informed consent to the registry.

Literature search prior to the study

We performed a search for currently ongoing registries and their data quality via Medline, Google, clinicaltrials.gov, and inquiries to societies. We included registries with focus on scientific, socioeconomic and healthcare provider perspectives, which are used and acknowledged by national scientific societies or governmental organizations. For the purpose of this article, we did not consider registries of scientific organization and/or universities/hospitals, focusing on one specific spinal disease or one scientific question. Search criteria were “audit,” “registry,” “accuracy,” “data completeness,” and “monitoring.”

Study design

STROBE guidelines were used to execute this study in accordance with cohort studies.

The German Spine Society (Deutsche Wirbelsäulengesellschaft, DWG) was contacted and confirmed the institutional certification of 23 German centers. Institutional certification is linked to mandatory data input into the DWG registry.

All the centers were contacted, and the participating centers were chosen according to Fig. 1. The minimum mandatory requirement upon certification was to document in the registry all inpatient data of patients undergoing spine surgeries. For the audit visit, a 2-week period was defined which was 2 months before the initial contact to the centers, in which all patients, which had had spine surgery, were analyzed. In a prospective study, design audits by a board-certified neurosurgeon were then planned and conducted at all participating centers. The percentage of surgeries that were not documented in the registry at the time of the first contact concerning the study as well as the overall rate of undocumented cases were analyzed. Moreover, any discrepancy between the patients’ charts and the registry entry was detected and analyzed at the audit visit. A mean of 30.95 ± 3.05 items per patient (median 31, range 0–40) and a mean of 20.92 ± 7.96 (median 20, range 11–38) cases per center were analyzed.

Fig. 1
figure 1

Flowchart: This chart shows the exclusion of several centers due to the lack of data entries in a considerable number

Variables

The analyzed items per case included main diagnosis, type of fracture, type of degeneration, classification of spondylolisthesis and osteoporosis, affected levels, type of tumor, previous surgeries, type and duration of symptoms, date of surgery, surgeon, implants, type of surgery, postoperative course, complications, etc. Table 1 provides a comprehensive overview.

Table 1 Analyzed items

Statistical analysis

The SPSS v.24 (IBM Inc., Armonk, New York) statistical software package and R statistical software (version 3.1.0; https://www.r-project.org/), Vienna, Austria were used for the analysis. The objective of our study was to deliver data on the general accuracy of data originating from a large national registry per se and on the other hand to identify potential factors leading to an increased rate of wrong entries depending on the item.

Results are presented as proportions or means with 95% confidence intervals (CI). Wilcoxon–Mann–Whitney statistics and Kruskal–Wallis statistics were used to test for differences in accuracy rates of the entries depending on a) monitoring of the data entry and b) the person performing the data entry were entered.

Results

Overall data completeness

Out of 23 certified spine centers contacted, 17 centers were willing to participate, but 4 were still lacking any data entries. Even in the remaining 13 centers eligible for audits, 28.50% (95%-CI = [22.46–34.55]) of entries were finalized only after the audits were announced. Thus, 71.50% (95%-CI = [65.45–77.54]; range between centers: 18.18–100%) of all operated patients were completely entered into the registry at the time of the first contact (Fig. 2).

Fig. 2
figure 2

Completed entries upon first contact: This chart shows the percentage of completely entered cases into the registry in relation to all entered patients per center at the time of the first contact for the study

At the time of the visit only, 82.55% (95%-CI = [79.12–85.98]; range between centers: 20.41–100%) of surgeries were documented in the registry although all centers are required to enter all operated spine cases without exception (Fig. 3).

Fig. 3
figure 3

Missing of whole cases: This chart shows the percentage of data entries in relation to all operated patients per center

Incorrect entries per center

On average, 14.95% (95%-CI = [10.93–19.00]) of entries were not accurate with a variation (range: 6.21–27.44%) between centers (Fig. 4). The setup also varied largely in the centers. While the registry entries were done by the operating surgeon in 10 centers, they were done by a study nurse in three centers. Looking at the influence of the person entering data, there is a significant difference with fewer errors if the surgeon (12.02% [10.47–13.58]) performs the data entry compared to a study nurse (15.65% [14.22–17.68]) (Fig. 5; p = .0004).

Fig. 4
figure 4

Wrong entries per center: This box plot shows the percentage of wrong data entries of the registry versus the patient charts in relation to all data entries per center

Fig. 5
figure 5

Entering person: This box plot shows the influence of the entering person: there is a significant difference with fewer errors if the surgeon performs the data entry (p = .0004)

Table 1 shows the percentage of wrong registry entries per item. Wrong entries per item varied largely and were between .4 and 50% of the entries. It is worth mentioning that complications as one of the most useful data from large registries showed a failure rate of 50%.

Influence of internal monitoring

In four centers, no internal monitoring was performed. Two centers perform an internal check whether the patient was entered into the registry (quantity) but without any content control while there is a quality monitoring in seven centers, which is either done by an attending or by a study nurse. Yet, the percentage of wrong entries did not differ significantly for all three groups (no monitoring: 10.10% [8.75–11.46]; quantitative monitoring: 17.83% [11.96–23.67]; qualitative monitoring: 14.72% [12.29–17.15], p = .008; Fig. 6a). Comparing “no qualitative monitoring” (12.00% [10.16–13.84]) vs. “qualitative monitoring” (14.72% [12.29–17.15]), we also do not observe any significant difference (p = .084; Fig. 6b).

Fig. 6
figure 6

Internal monitoring: This box plot demonstrates the percentage of wrong entries depending on the presence or absence of internal monitoring. a For all three groups (no, quantity, and quality; p = .008); b “no qualitative monitoring” versus “qualitative monitoring” (p = .084)

Discussion

Added value of this study

Our study will raise awareness on the general shortcomings of registries even if supervised by large societies and will help to improve future registry design and healthcare decisions. It is particularly valuable because it is actually the first study to determine the accuracy and completeness of a large nationwide registry.

In the light of today’s socioeconomic and health political influence of data gained from large national registries, this study provides profound evidence that the design and structure of most current registries does not provide reliable data sufficient for political, medical, or economic decision making.

General aspects

Our results indicate that the data of the DWG registry are neither suitable for clinical science purposes nor quality management for the time being. Compliance with mandatory data input was merely moderate. Completeness of data was not satisfactory on average, but especially inacceptable in a significant fraction of individual centers despite the fact that only the minimum data set was required, i.e., a reduced inpatient data set without follow-up. Accuracy of data was also not satisfactory, again with a wide variation between centers. Internal non-mandatory monitoring did neither influence completeness nor accuracy of data.

Data accuracy and audits

There are various aspects, which could improve data quality and should also be pursuit such as a strict definition of validated key metrics for each disease, enabling automated data extraction, and providing adequate financing for sufficient resources.

The DWG registry examined in this study is based on the Spine Tango platform. As mentioned above, the reliability and usefulness of the results are inherently dependent on the quality and completeness of data being entered into the registry. RCTs undergo regular monitoring visits to ensure accurate and complete data entry, but most of the current registries fail to have mandatory comprehensive-auditing procedures. The problems identified in this registry seem therefore inherent to the majority of spine registries and probably beyond.

Of course, in order to prevent selection bias and also to account for the actual variances between centers concerning outcomes, multivariate analyses are required adjusting for covariates [4]. But multivariate analyses can only handle known covariates or bias. If these are unknown or not recorded by the registry, we have no chance to clear the registry data from these influences potentially leading to wrong or unrepresentative results. Thus, the finer granularity is enabled by the required items per patients, the more accurate and the more unlikely that the data set suffers from unknown bias. However, fine granularity is more elaborate for the centers, thus leading to incomplete data sets. Regarding Table 1, there are items that are more prone to incorrect entries than others, and it is important to control these issues and change the items themselves accordingly.

It is important to say that the large variance between centers (Fig. 4) also points out that we need to differentiate between inherent problems of a registry itself causing low data accuracy and completeness and culture in the participating centers themselves.

Likewise, a large amount of missing data will also limit the value of the registry data. Thus, direct and real-time internal monitoring of data quality and integrity is crucial when designing and planning a registry. Moreover, without an active external monitoring or auditing process, the danger of selective reporting/entry is obvious. Particularly, the reported rate of complications seems extremely low in our study (Table 1). Bias can occur by factors such as omitting whole groups of high-risk patients or liberal interpretation of what constitutes as a complication. Both factors are known to occur if no rigorous guidelines for reporting are established and ultimately controlled. To our knowledge, this is not the case for any European spine registry at least.

Follow-up versus data protection

Another problem not examined in this study is the lack of relevant follow-up. Two aspects exist. First, follow-up rates tend to be low in general, but for clinical research, collecting follow-up data remains crucial. It is known that any loss of follow-up larger than 20% of the enrolled cases can lead to a significant bias in the statistical evaluation [5]. In a study by McGirt et al., the follow-up rate in spine registries was found to be between 22 and 79% [6]. Thus, even the registry with the best follow-up rate was worse than minimally recommended one. In our study, only 71.50% (95%-CI = [65.45–77.54]; range between centers: 18.18–100%) of all operated patients were completely entered into the registry at the time of the first contact (Fig. 2); and this even only reflects follow-up until discharge.

This issue relates to the current state of legislation regarding data protection in most countries and is problematic for quality management. To calculate some of the key indicators for quality of care such as 30-day readmission or reoperation rates, a patient would need to be traced, irrespective of the hospital taking care of the complication. Particularly in urban areas with a high density of specialists, a significant number of patients will seek a second opinion for problems occurring after the index surgery. Identification of a given patient for longitudinal follow-up is crucial. Outside Scandinavia, however, this is not possible in any other country we are aware of, which means that registries in those countries provide data that will clearly underestimate important quality indicators.

Further approaches to improve data quality

In order to find further solutions for the imminent problems raised by this study, awareness among peers teaching on the potentially crucial implications of registry data but also possible financial incentives could be used. Changing culture is another part.

Yet, when looking at the involved centers, those were the first certified spine centers in Germany and not only by this fact but also by personal knowledge highly motivated professionals, in part leaders in the field and well motivated. Thus, blaming lack of motivation, culture of education on the topic might not serve as complete solution. Other issues, such as shortness of staff, amount of required data and lacking usability of the registry software itself, should also be taken into account.

Conclusion

Due to the high inaccuracy, the high number of centers lacking mandatory entries at all and the number of false entries these data alert us to advocate unannounced audits and other measures to improve the situation such as financial incentives and education on the benefits and consequences of such data. The current data should not be used for the time being, since wrong conclusion will inevitably be drawn. Aspects for improvement of the situation were identified.