Introduction

Assessment of the quality of care by means of performance indicators is an integral part of modern day health care. Performance indicators are a tool in quality improvement and provide the government, physicians, patients, scientific society and insurance companies an indication of hospital performance, which is increasingly demanded [1]. As comparing performance indicator scores between hospitals can have major consequences, including lay press ranking lists and government and insurance company sanctions, performance indicator scores need to be comparable.

There are several steps in the process that leads from an event happening in clinical practice to a performance indicator intended to measure the performance of a clinical practice regarding that event [2]. This process is illustrated in Fig. 1. Variations in any of these steps will lead to different performance indicator scores. Ideally, data recorded for performance indicators are based on sound clinical practice guidelines, in which the definitions and inclusion and exclusion criteria of the performance indicator are clear and unambiguous and then processed in a uniform way to calculate the performance indicator in a uniform way. In reality, however, definitions are far from unambiguous and data are recorded in a variety of ways, impeding comparability of indicators for external quality control [3, 4]. This means that users of performance indicators need to be aware of the possible impact of variations in definitions and quality of the data in terms of availability, accessibility and completeness [5, 6]. The more unambiguous the definitions and the higher the quality of the underlying data, the more likely the performance indicator scores will be accurate and consistent between hospitals [7].

Fig. 1
figure 1

Comparability of data: flow from collection to interpretation.

For patients diagnosed with ST-segment elevation myocardial infarction (STEMI), international guidelines recommend timely invasive treatment by primary percutaneous coronary intervention (PCI), generally within 90 min of first medical contact [8, 9]. Delays in timely invasive treatment by PCI caused by, for example, residential distance rapidly decrease the benefits over alternative treatments [10, 11], while shortening delays has the potential to contribute to decreased heart failure and mortality [12, 13]. It is, however, unclear to what extent the treatment delay indicator scores are comparable between hospitals. This study therefore aims to investigate to what extent variations in definitions influence performance indicator scores. Moreover, we investigate to what extent the quality of data in terms of availability, accessibility and completeness influences performance indicator scores. We conclude by providing recommendations for improving comparability of performance indicator scores.

Methods

Patient data

Secondary data were used from two university hospitals and five tertiary teaching hospitals performing PCI participating in the acute coronary syndromes (ACS) program evaluation, within the larger national safety management program: ‘VMS safety management program’ [14].

Data from these seven hospitals were collected manually by six chart abstractors using standardised case report forms. All abstractors had a background in research and received instructions for the chart review procedures by JT and JE. The chart abstractors collected data by means of retrospective review of the medical records in electronic or paper-based medical, nursing or catheterisation laboratory records of patients discharged between 1 January and 31 December 2012. Each month, eligible records of patients discharged in the preceding month were selected from the hospital billing system using the diagnosis treatment combination code. To determine the STEMI population, chart abstractors first considered all the records of patients diagnosed with ACS for inclusion. Next, the chart abstractors checked whether the discharge letter confirmed the ACS diagnosis. When the discharge diagnosis was unclear, the record was discussed with a cardiologist or other attending physician working in the field of cardiology. Charts of patients with a treatment delay not exceeding 6 h were included in the study [15]. Charts of patients without a discharge diagnosis of STEMI, those not undergoing an acute PCI, patients with secondary ACS (e.g. due to anaemia), those undergoing elective procedures, patients with missing or uninformative charts and the charts of patients under the age of 18 years were excluded from the study. Chart abstractors signed a confidentiality agreement and all data were stored on a password protected network server of the VU University Medical Centre.

Quality indicator definitions

Five definitions for the treatment delay indicator were derived from literature (Table 1 and Fig. 2): (A) The Dutch ‘VMS safety management program’ guidelines [14]; (B) The adjusted Dutch ‘VMS safety management program’ evaluation [14]; (C) The mean door-to-needle time [15]; (D) The door-to-balloon time (American ACC/AHA guidelines for the management of STEMI [9, 16]); and (E) The European Society of Cardiology guidelines for the management of STEMI [8]. In these five definitions, treatment delay was defined as: (A) PCI within 90 min of first medical/paramedical contact; (B) PCI within 90 min of first electrocardiogram (ECG); (C) the mean door-to-needle time (no threshold provided); (D) PCI within 90 min of hospital arrival, and (E) PCI within 90 min after first medical contact. The B definition is an adaption of the A definition, because the time of first medical/paramedical contact was not registered consistently in all PCI centres but the time of the first ECG was. Thus, for this study, treatment delay was defined as the time from first ECG to PCI. Noteworthy is further that indicator C asks for the mean door-to-needle time, illustrating that different organisations ask hospitals to register different information. Moreover, although none of the PCI centres registered the time of wire passage in the culprit artery, which is used by the ESC in the last definition, we provide this definition as an illustration because these guidelines provide the basis for the first and second definitions. For this study, we regarded the time from first ECG to PCI as the reference standard for pragmatic reasons. We emphasise that this definition is not a gold standard as there is no common gold standard for measuring treatment delay due to national and international differences and differences in perceptions of stakeholders. Moreover, the definitions are used for comparison reasons and not to conclude what the best definition is.

Fig. 2
figure 2

Delays from symptom onset to first intervention in patients with STEMI and five performance indicator definitions (A-E). GP general practitioner, EMS emergency medical services, ER emergency room.

Table 1 Definitions for the performance indicator ‘treatment delay’.

Outcome measures

Data quality

To investigate data quality (availability, accessibility and completeness), we assessed whether or not particular time points involved in the various definitions were recorded in each of the hospitals. If the data were recorded, the researcher noted how they were accessible. Accessibility was divided into three categories: (1) automatically accessible, (2) partly automatically accessible or (3) manually accessible [3]. Automatically accessible meant that data elements stored within the hospital information system could be easily reviewed (‘only a few mouse clicks away’) and extracted by means of computerised search algorithms. Partly automatically accessible meant that data elements were available in the hospital information system and could be reviewed easily, but could not be extracted by means of a computerised search algorithm and that manual actions were required. Manually accessible meant that data elements were available but only through intense data handling such as paper-based medical record reviews. Additionally, two chart abstractors retrospectively noted per hospital where and in what form data were found, such as in medical records, nurse records, discharge letters, electrocardiograms, procedure letters, correspondence with other health care professionals and in paper form, scanned or in the hospital information system. Finally, we assessed the completeness of the available information at the patient level. We measured the percentage of patients for whom all time points that should be recorded were indeed available.

Influence of definitions on indicator scores

To investigate the influence of the performance indicator definition on the scores, we calculated the percentage of patients for whom the treatment delay indicator was below the threshold for each hospital according to the different definitions.

Results

Patient data

Secondary data were used from two university hospitals and five tertiary teaching hospitals performing PCI. The bed capacity in these hospitals ranged between 400 to over 1100. Initially, 4471 records were reviewed for inclusion. After excluding records of patients who were not diagnosed with STEMI or excluded based on exclusion criteria (n = 3454), 1017 records were available for analyses, ranging between 112 and 236 included records per hospital.

Outcome measures

Data quality

The chart abstractors reported that some hospitals recorded all the required data elements for the calculation of the performance indicator scores. Moreover, automated access to these data was not possible in most cases. The most common ways to access the data were manual or partly automated access (four of the seven hospitals). Fully automated access was not available for any of the data elements, illustrating that data collection was time consuming and costly.

For all available and accessible data, we noted where this information was found (Table 2). For the extraction of data elements with partly automated or manual access, the chart abstractors had to review a combination of medical records, nurse records, discharge letters, electrocardiograms (ECG), procedure letters, correspondence with other health care professionals, and in paper form, scanned or in hospital information system. Table 2 illustrates that the accessibility of data did not only differ per hospital, but also per time point within hospitals.

Table 2 Data accessibility per hospital.

The completeness of the available information is illustrated in Fig. 3. In 24 % of patients the time of first contact was recorded, in 88 % of the patients the time of ECG, in 51 % of patients the time of arrival at the PCI centre, in 94 % of patients the time of sheath insertion and in 64 % of patients the time of first intervention was recorded. Thus, hospitals vary greatly in completeness of recording, particularly with respect to the time of first contact.

Fig. 3
figure 3

Completeness of time points per hospital.

Influence of definition on indicator scores

Table 3 shows the percentage of patients satisfying the indicator threshold for each of the definitions and each of the hospitals. Indicator score B was reported best, with 15–50 % missing data across hospitals. Missing data on indicator scores A, C and D were generally over 50 % ranging from 21 to 100 %. When calculable, indicator scores ranged from 57 to 100 % within a given hospital, dependent on the indicator definition.

Table 3 Time to PCI indicator: % of patients with missing data and number of times 90 min indicator was reached (n yes; n total) per definition per hospital.

Discussion

This study illustrates that hospital performance indicator scores for the treatment delay performance indicator are largely incomparable, without laborious manual review.

Three reasons contribute to this incomparability. First, definitions vary for treatment delay performance indicators across the literature, which leads hospitals to vary in the extent to which different time points are recorded and/or used for calculating performance indicators. These differences are also due to the low number of patients and missing data. This is partly due to the choices hospitals make regarding which times to record, but also due to the format in which organisations compel hospitals to report indicators (as percentage or mean). To compare indicator definitions among patients with all data points would be a methodologically sound method. In practice, information is not available for all the data points in any of the patients, as hospitals use different definitions for treatment delay and vary greatly in the extent to which the necessary data are available, accessible and complete. So, this leads to substantially different indicator scores, especially between definitions A and B versus D. Second, the chart abstractors reported that some hospitals had all the required data elements for calculation of the performance indicators and data could not be retrieved easily in any of the hospitals. Moreover, data accessibility not only varied between hospitals, but also between data elements within hospitals. The same hospital could therefore have a relatively low indicator score following one definition and relatively a high score following another definition. Third, we found large variations between hospitals in completeness of time records.

Previous studies on the comparability of medical data in the Netherlands and across Europe similarly showed that required data elements for performance indicators were generally poorly available, accessible and incomplete [3, 16, 17, 18]. This may partly be due to the enormous number of indicators hospitals have to report on for external quality control. In order to compare indicator scores among hospitals it is thus necessary to standardise definitions and record data uniformly, and possibly reduce the number of indicators that hospitals have to accurately measure [19, 20].

To obtain structured data, predefined computer-based forms to record relevant procedures and findings in a structured, standardised format, have been shown to be advantageous [21]. One way to convert the currently used open text into a more structured format is the use of Natural Language Processing tools. However, as most tools are developed in English, further research is required on how to handle the Dutch. Moreover, to enhance the correctness of data items and thus efficiency of secondary use of data, the Netherlands Federation of University Medical Centres is detailing how to best apply the ‘collect once—use many times’ principle [22]. A next step could be to automatically extract data quality items from the hospital information system, checked by a responsible party and submitted to quality registers or other authorised parties [20]. Ideally, data that are in standard codes from comprehensive controlled clinical terminologies such as SNOMED CT can be reused automatically. In the Netherlands, an action plan was recently developed to create a standardised continuity of care record for Dutch hospitals and to create semantically sound subsets of terminologies using SNOMED CT and ICD 10 [20]. Moreover, the USA initiated a nationwide taskforce Meaningful Use of Complex Medical Data in order to overcome problems analysing large amounts of medical data in a timely fashion [23]. Today, hospital performance data can be linked to national mortality databases to provide information on long-term outcomes and survival, provided data can be tracked across providers, which is facilitated by unique person identifiers [24]. Such a national registry is not available for acute coronary syndromes in the Netherlands, whereas this has been possible for many years in other countries, such as Sweden and the UK [25]. Given these advances, performance indicators based on administrative data could be a very useful tool to flag possibilities for quality improvement in hospitals. The extent of these propositions, however, does not provide practitioners with a direct, simple solution. The proposed statements include steps which need to be taken in order to prevent incomparability in the future. Hospital associations in the Netherlands are now working on these steps. Despite the lack of solutions, we feel it is important to inform practice of the critical notion that hospital performance indicator scores for the treatment delay performance indicator are largely incomparable, without laborious manual review.

Our study has several limitations. The time points extracted to calculate indicator scores per hospital may be an overestimation of data completeness compared with indicator scores calculated and supplied by hospitals themselves, because data were extracted by chart abstractors who went to great lengths to obtain data. Moreover, the data obtained by our chart abstractors may deviate from hospital data as the chart abstractors made decisions to clarify which data were necessary to calculate performance indicator scores, such as manually checking all diagnoses in the discharge letter based on the diagnosis and procedure codes. Also, the presence of researchers collecting data on site and the provision of feedback of performance may have influenced documentation of times and performance indicator scores. However, as the patient safety program for which the data were primarily collected was designed to improve guideline adherence and provide hospitals with feedback of their own performances, it would not be appropriate to withhold this information. Consequently, another limitation is the secondary use of data that were obtained for the goal of measuring guideline adherence. For example, the exclusion of uninformative charts means that data were preselected on their quality. In spite of these limitations, our results show that the comparability of indicator scores is influenced by data quality issues.

Conclusion

In sum, hospitals use different definitions of this one particular quality indicator and vary greatly in the extent to which the necessary data are actually available, accessible and complete, impeding comparability between hospitals. It is important to increase awareness among developers, users and producers of performance indicators regarding the impact of variations in indicator definitions and data quality on indicator scores.