FormalPara Key Points

In this population-based, multinational post-authorization safety study to evaluate cardiovascular safety in initiators of prucalopride versus a matched cohort of polyethylene glycol 3350 (PEG) initiators, major adverse cardiovascular events (MACE) endpoints were identified using electronic algorithms.

Validation of MACE endpoints followed a common protocol, adapted for each of three United Kingdom (UK) data sources: Clinical Practice Research Datalink (CPRD GOLD), The Health Improvement Network (THIN), and Information Services Division (ISD) Scotland.

Validation occurred through direct confirmation via linkage to hospital records, requests for additional clinical information, manual review of potential events retrieved by electronic algorithms, and adjudication of potential events by clinicians who were blinded to exposure.

Of 260 potential MACE events identified by the electronic algorithms (108 from CPRD GOLD, 79 from THIN, and 73 from ISD), 100 were considered actual events after identification of confirmed events, review, and the adjudication process. Given the limitations of electronic algorithms to identify cardiovascular outcomes, validation with clinical review is essential.

1 Introduction

Prucalopride (Resolor) is a 5-hydroxytryptamine receptor type 4 (5-HT4) agonist approved for the treatment of chronic constipation [1]. On regulatory request, a post-authorization safety study (PASS) was conducted in 2016–2017 to assess the cardiovascular safety of prucalopride by comparing the occurrence of cardiovascular events among initiators of this drug and among a matched comparator cohort of polyethylene glycol 3350 (PEG) initiators [2, 3].

The study was conducted in five European data sources: the Clinical Practice Research Datalink (CPRD GOLD), the Health Improvement Network (THIN), and the Information Services Division (ISD) Scotland in the United Kingdom (UK); the German Pharmacoepidemiological Research Database (GePaRD); and the Swedish National Registries (SNR) [4]. The primary endpoint of the PASS was occurrence of a MACE, defined as first occurrence of any of the following endpoints: hospitalization for nonfatal acute myocardial infarction (AMI), hospitalization for nonfatal stroke, or in-hospital cardiovascular death.

Potential clinical endpoints during the study period were identified using electronic algorithms in the five databases. Although the validity of MACE endpoint definitions has been established in databases from both Europe and the United States [5,6,7,8], validation of study endpoints by clinical experts was planned a priori as part of the prucalopride PASS in the three databases for which this was expected to be feasible during the study period: CPRD GOLD, THIN, and ISD Scotland.

Here, we describe the methodology and results of the event identification and validation for MACE in the three UK data sources conducted as part of the prucalopride multidatabase PASS. The PASS aimed to provide standardized estimates of incidence rates and incidence rate ratios of MACE among initiators of prucalopride compared with PEG initiators, and methods and results for the PASS are reported elsewhere [2, 3].

2 Methods

2.1 PASS Setting, Data Sources, and Study Design

As noted previously, details on the research methods and study results of the prucalopride PASS have been reported in detail elsewhere [2, 3]. Briefly, this was a population-based, observational, retrospective cohort study of adult patients with chronic constipation initiating either prucalopride or PEG (matched on a 1:5 ratio), from the UK, Sweden, and Germany. Cohorts of new users were followed from date of first prescription of study drug until first MACE (i.e., hospitalization due to AMI or stroke or in-hospital cardiovascular death), death due to other reasons, end of data collection, or end of study period. The study compared the risk of MACE in prucalopride users versus PEG users, pooling the information from five large health care databases: CPRD GOLD, THIN, ISD Scotland, GePaRD, and SNR.

Event validation for the primary study endpoint was performed only in the UK data sources. Validation via medical record review was not performed in the SNR or GePaRD for this study. However, clinical data from the SNR and GePaRD have been previously validated and determined to be of good quality [9,10,11,12]. The characteristics of these three databases (CPRD GOLD, THIN, and ISD Scotland) have been reported previously [3]. Briefly, CPRD GOLD is a database derived from electronic medical records from general practices and includes linkage to Hospital Episode Statistics (HES) for inpatient data and to Office of National Statistics (ONS) data for cause-of-death data for some participating practices. THIN is a primary care database with no link to ONS or HES, but with access to anonymized free-text comments entered by treating general practitioners (GPs) at the time of this study. ISD Scotland is a data source derived from the linkage of routinely collated dispensing, hospitalization, and death certification data.

2.2 PASS Endpoint Definitions

The primary endpoint of the prucalopride PASS was first occurrence of a MACE within the study period (2010–2016), defined as the composite of hospitalization for nonfatal AMI [13, 14], hospitalization for nonfatal stroke (either ischemic or hemorrhagic) [15], and in-hospital cardiovascular death. For the latter, a very broad definition was used, aimed at ensuring that any potential in-hospital cardiovascular death would be detected as a potential study endpoint. This definition included deaths from AMI, stroke, heart failure, cardiovascular hemorrhage, sudden cardiac death, cardiovascular procedures (coronary revascularizations), and other cardiovascular deaths [14]. Individual components of MACE were also evaluated separately as secondary study endpoints.

2.3 Electronic Ascertainment of Potential Events

Potential study events were identified in all databases using modified versions of previously reported automated algorithms [16,17,18,19,20], which included diagnosis and procedure codes (International Classification of Diseases [ICD] codes and/or codes from local dictionaries). Supplementary Table S1 summarizes the codes used to define each study endpoint (see the electronic supplementary material). Compared with the published definitions, the algorithms underwent minor modifications after clinical review by study investigators to harmonize them across sources and to account for limitations in data availability.

The diagnosis codes were combined with additional qualifiers (e.g., hospital admission components) in operational definitions that also provided a preliminary, automated classification of the study endpoints as “confirmed,” “definite,” “probable,” or “possible” cases, as well as “unknown cause of death” specifically for fatal events (Table 1).

Table 1 Operational definitions used in the electronic algorithms for case ascertainment and used as guidelines by clinical reviewers for event adjudication purposes

In each data source, hospitalizations for nonfatal AMI and nonfatal stroke were identified by hospital discharge diagnosis codes in patients discharged alive from the hospital. In-hospital cardiovascular deaths were identified through primary cardiovascular discharge diagnoses or the underlying cause-of-death codes when these data were available (CPRD GOLD and THIN). Patients admitted to the hospital for a noncardiovascular cause, regardless of whether it resulted in a cardiovascular cause of death, were initially considered potential cases, and further information was collected, if available. However, if there was no clear cardiovascular cause of hospitalization, the event was not considered a relevant outcome for this study. In THIN, because linkage to HES/ONS was not possible, any (i.e., all-cause) in-hospital deaths were considered for review at this stage, and a preliminary additional manual plausibility evaluation of these death events was conducted before considering them potential cardiovascular death study endpoints.

2.4 Case Validation

After all potential study endpoints (including all occurrences during the study follow-up period, not just first occurrence) from the three UK data sources were identified electronically (and provided a preliminary classification), duplicate cases between THIN and CPRD GOLD were removed from the THIN data source (as described in Fortuny et al. [3]). The approach taken to ensure removal of duplicate patients involved (1) eliminating all the Scottish practices from the CPRD and THIN since they were also included in the ISD Scotland database and (2) excluding potentially duplicated practices in THIN and CPRD. Using a modified six-step algorithm based on patient-level demographic and pharmacy data [21], we identified patients thought to be the same individual in the CPRD and THIN without compromising patient or practice confidentiality. In case of duplication, CPRD practices were retained and their equivalents in THIN were removed.

Validation was conducted for all potential study endpoints. Validation occurred per the following four general steps: (1) identification of electronically confirmed cases and direct confirmation via linkage to hospital records (CPRD GOLD only); (2) request for additional clinical information through questionnaires (CPRD GOLD), free-text (THIN), or original hospital case records (ISD); (3) manual review by study investigators of the preliminary patient profiles (e.g., database listings of outpatient visits, procedures, and medications) of events retrieved by the algorithm (CPRD GOLD/THIN) to rule out noncases; and (4) event adjudication by the prucalopride PASS adjudication committee (three clinicians, all blinded to exposure). This common validation protocol had to be adapted to the resources and type of data available in each data source.

2.4.1 Case Validation in the CPRD GOLD

In CPRD GOLD (Fig. 1), potential cases were considered confirmed (with no additional validation needed) through hospital discharge diagnoses from HES data if any of the following occurred: (1) for nonfatal events, the primary hospital discharge diagnosis codes were consistent with the list of codes from the study protocol (Supplementary Table S1, see the electronic supplementary material); and (2) for in-hospital deaths, the death occurred in the hospital (identified in HES) and the underlying cause-of-death codes from ONS were consistent with the list of codes from the study protocol (see Supplementary Table S1).

Fig. 1
figure 1

Validation process in CPRD GOLD. AMI acute myocardial infarction, CPRD Clinical Practice Research Datalink, GP general practitioner, HES Hospital Episode Statistics, ICD-10 International Statistical Classification of Diseases and Related Health Problems, 10th Revision, ONS Office of National Statistics, QC quality control. aApproximately 50% of patients are linkable to HES and ONS. bDefinite cases were reassigned as “confirmed” and unknown cases were reassigned as “noncases”; the unknown case classification applied only to cardiovascular deaths

For nonconfirmed potential cases, questionnaires were sent to the treating GPs. To complete the questionnaires, GPs were asked to review the medical records in their possession, including hospital discharge reports, autopsy reports, death certificates (if available to the GP), and free-text, and to complete a specific questionnaire designed for each endpoint. Review of free-text comments was also originally planned, but eventually these became unavailable during the study conduct due to data privacy reasons implemented throughout CPRD GOLD.

Using the information included in automated patient profiles and in available GP questionnaires, a preliminary adjudication/classification was conducted by trained epidemiologists so that obvious noncases could be excluded from the subsequent adjudication review. The investigators were blinded to exposure status, and the classifications and criteria used were the same as the ones used by the electronic algorithms for their preliminary classification, plus “noncase” (i.e., potential cases not meeting any of the prior definitions). Potential cases classified as noncases by one epidemiologist were reviewed by a second one and, when in agreement, were not considered as a case in the study. The remaining potential cases were reviewed by the study adjudication committee.

2.4.2 Case Validation in THIN

In THIN (Fig. 2), linkage to HES or ONS was not feasible for this study because of the limited number of linked practices and the long lag of data updates. Therefore, there were no electronically confirmed cases in this data source. As discussed previously, a preliminary plausibility evaluation (manual review of individual patient profiles) of all death events identified was conducted in THIN; death events for which there were conclusive records of an alternative, noncardiovascular etiology were removed from further validation steps and considered “noncases.”

Fig. 2
figure 2

Validation process in THIN. AMI acute myocardial infarction, CV cardiovascular, GP general practitioner, THIN The Health Improvement Network. aDefinite cases were reassigned as “confirmed” and unknown cases were reassigned as “noncases”; unknown case classification applied only to cardiovascular deaths.

GP questionnaires were not used in THIN. Instead, for all potential cases, any free-text comments (i.e., notes made by physicians regarding the patient encounter, for up to 6 months before and after the qualifying diagnosis code or information on the cause of death) were requested and manually reviewed. As in CPRD GOLD, blinded patient profile review by trained epidemiologists was conducted, allowing for the exclusion of additional obvious noncases. The remaining potential cases of MACE were sent to adjudication.

2.4.3 Case Validation in ISD Scotland

In ISD Scotland (Fig. 3), all events were automatically considered potential, and no cases were initially considered confirmed. Medical record abstraction was conducted for all potential cases using chart abstraction forms completed by a team comprising a research nurse and a study doctor. All potential cases underwent review by the study adjudication committee, except for three for which clinical data could not be obtained.

Fig. 3
figure 3

Validation process in ISD Scotland. If medical record abstraction (second green box) was not possible for a given patient, case status was assigned based on classification by the electronic algorithm. AMI acute myocardial infarction, ICD-10 International Statistical Classification of Diseases and Related Health Problems, 10th Revision; ISD Information Services Division, NRS National Records of Scotland

2.5 Event Adjudication and Final Case Classifications

An adjudication committee of three clinical experts from the research partner institutions involved in validation blinded to the study exposure reviewed all nonconfirmed, nonexcluded potential cases from the three data sources and determined their final status using all clinical information available. Scorecards were used to document final classifications, as well as their rationale. Clinical reviewers attended a training session, in which they were provided detailed guidance on clinical definitions of study endpoints, on the use of scorecards, and on the levels of certainty to be used for classifying study endpoints [i.e., definite, probable, and possible case; noncase; and unknown cause of death (Table 1)].

Two reviewers independently evaluated each potential case and each provided a classification. In the event of disagreement, a third reviewer also evaluated the event, followed by discussion by the full adjudication committee until a final classification was reached by consensus. Only those potential cases with a final classification of “confirmed” or “definite” after validation were included in the PASS main analysis as MACE cases, whereas the remaining were considered noncases. In a sensitivity analysis, potential cases classified by adjudicators as “probable” (and for death events, as “possible”) were also considered to be cases.

2.6 Statistical Analyses

We described the flow of the potential cases of the primary study endpoint, from initial electronic identification to final classification after adjudication review, overall and by data source. In CPRD GOLD, we also described the response rates among the GP questionnaires sent to valid practices (active practices accepting questionnaires).

2.7 Research Ethics

All relevant authorities reviewed and approved the study on ethical grounds in the UK: the Independent Scientific Advisory Committee for CPRD GOLD; the Scientific Review Committee for THIN; and the East of Scotland Research Ethics Services and the Public Benefit and Privacy Panel for Health and Social Care for ISD, who allowed such access for regulatory purposes for the first time. Any clinical information used for validation purposes was de-identified.

3 Results

3.1 Prucalopride PASS Population from the UK

The prucalopride PASS initially included 16,426 unique participants from the UK before matching and trimming: 5710 individuals from the CPRD GOLD, 3222 from THIN, and 7494 from ISD Scotland. The characteristics of each of these cohorts, overall and by treatment group, have been described elsewhere [2, 3].

3.2 Potential MACE Cases Identified by Electronic Algorithms

Figures 1, 2, 3, and 4 present the flow of the potential study endpoints, from initial identification to final classification, by database as well as overall. The electronic algorithms identified 260 potential events of the primary study endpoint in the three UK data sources: 108 identified in the CPRD GOLD, 79 in THIN, and 73 in ISD Scotland. These identified events could occur both within and beyond the risk window specified in the primary analyses.

Fig. 4
figure 4

Validation flowchart, all UK data sources. CPRD Clinical Practice Research Datalink, CVD cardiovascular disease, ISD Information Services Division, THIN The Health Improvement Network, UK United Kingdom

Of the 260 potential MACE events identified by the electronic algorithms, 38 cases were considered confirmed via linkage to hospital records (CPRD GOLD only), 56 were ruled out as clear non–cardiovascular death cases (THIN) after review of patient records of all identified deaths, and three were not available for further review (ISD); the remaining 163 events were considered potential cases at this point. After manual review with additional information, 45 were considered noncases (CPRD GOLD and THIN). In the final adjudication of the 118 remaining potential MACE cases, 62 were adjudicated as definite, 10 as probable, 13 as possible, and 33 as noncases (Fig. 4). The GP questionnaire response rate in CPRD GOLD was 80%, free-text was available for all potential events from THIN, and all but three requested hospital case records from ISD were retrieved. This was the first observational study in Scotland in which access to hospital case records was granted.

3.3 Validation Flow in the CPRD GOLD

Of the 108 potential cases identified, 38 were automatically confirmed (see criteria for automatic confirmation above) and did not undergo further evaluation (Fig. 1, Table 3). A total of 50 questionnaires were sent to valid practices, with a response rate of 80%. Table 2 shows the response rate for GP questionnaires by type of study outcome. From patient profile review and questionnaire information, 43 were classified as noncases, and the remaining 27 potential cases were reviewed by the adjudication committee. Of these, ten were classified as definite MACE events.

Table 2 Description of the general practitioner questionnaire process in CPRD GOLD

3.4 Validation Flow in THIN

Of the 79 electronic identified cases (Fig. 2, Table 3), an initial plausibility evaluation was performed among all deaths before considering them as potential MACE cases. This resulted in ruling out 56 deaths as in-hospital cardiovascular deaths (which did not undergo further validation). Free-text comments were requested for validation purposes for the remaining 23 potential cases. Patient profile review with free-text yielded the identification of two additional noncases (originally considered as one potential AMI and one potential stroke), and the remaining 21 potential cases were sent to adjudication. Of these, 12 potential MACE cases were adjudicated as definite.

Table 3 Validation of study outcomes by data source

3.5 Validation Flow in ISD Scotland

Of the 73 potential cases identified in Scotland (Fig. 3, Table 3), medical chart abstraction could be performed for all except three potential cases (which did not undergo further validation). Thus, 70 potential cases from ISD Scotland were reviewed by the adjudication committee, 40 of which were classified as definite MACE cases.

4 Discussion

In the UK component of the prucalopride multidatabase PASS, which included 16,426 unique participants prior to matching and trimming and after de-duplication from three data sources, local adaptations were needed to successfully implement a common event ascertainment and validation protocol. This included patient profile review and use of GP questionnaires in the CPRD GOLD; preliminary plausibility evaluation, patient profile review, and review of free-text comments in THIN; and medical record abstraction in ISD Scotland. Of the 260 potential MACE events identified by the electronic algorithms (108 from CPRD GOLD, 79 from THIN, and 73 from ISD), 100 were considered actual events after identification of confirmed events, review, and the adjudication process, with the CPRD GOLD being the largest contributor.

It is important to note that the definition used to identify potential causes of in-hospital cardiovascular death was, per regulatory request, very broad. This likely inflated the number of initial potential cases identified by the algorithms. Researchers conducting this type of study need to leverage the sensitivity of their endpoint definitions (very sensitive definitions yielding very few “false negatives,” but many “false positives”) with the resources available for case validation purposes—which may be insufficient if the potential number of events identified is large. In the prucalopride PASS, because the study population was young and death rates were low during the study period, using a very sensitive definition generated a reasonable number of potential events to review. In addition, the preadjudication patient profile review resulted in the identification of many obvious noncases, which reduced markedly the burden on the adjudication committee. Furthermore, to ensure the capture of all MACE, all potential events identified were subject to validation, not only those captured during the primary event risk window.

Additional factors may have contributed to the difference in identification and case confirmation proportions, particularly for in-hospital cardiovascular death events. In each data source, some sources of clinical information were lacking: for the CPRD GOLD, free-text comments were not available for validation purposes, and the response rate of the GP questionnaires was slightly lower than anticipated. Thus, for some potential CPRD GOLD cases, the patient profile was the only source of information for adjudicators to review. Linkage to the HES and ONS was possible in CPRD GOLD for approximately 50% of the patients. In THIN, no GP questionnaires were obtained and no linkage to the HES or ONS was possible. However, free-text comments were available for manual review for all potential cases. These features may have limited the ability of clinical reviewers to adjudicate the study events, particularly regarding assignment of cause of death events, for which the information included in the patient profiles/medical records was often very limited. This resulted in many events classified as “unknown” cause of death, which were considered noncases for analysis purposes. In CPRD GOLD, in-hospital cardiovascular deaths comprised the majority of potential events identified by the algorithms compared with the other data sources.

As mentioned previously, prior published MACE clinical definitions were used to create electronic algorithms to ascertain cases in health care databases [13,14,15,16,17,18,19,20]. A sensitivity analysis was performed that relaxed the required alignment with the algorithmic definitions used in the study. In this sensitivity analysis, where potential cases classified by adjudicators as “probable” (and for death events, as “possible”) were also considered to be cases, the results were consistent with those from the main analysis. Key lessons learned from this event validation exercise include the following: (1) the need to perform validation of outcomes in order to substantiate the validity of the study outcomes used in the analyses using these data sources; (2) the need to locally adapt study protocols to the disparate types of clinical data available in each data source; (3) the importance of using data sources with detailed clinical information for validation purposes, particularly for studies including cardiovascular death events; (4) the value of conducting preliminary review of potential cases in order to rule out obvious noncases, as a means for reducing the burden on the adjudication committee; and (5) the importance of involving clinical expert reviewers in the study validation.

5 Conclusions

A common validation protocol, with local adaptations of search algorithms and validations steps based on the types of clinical information available in each data source, allowed for the validation of MACE endpoints in the prucalopride multidatabase PASS in three UK data sources. Given the limitations of relying solely on computer algorithms to identify cardiovascular outcomes, validation with clinical review is essential to guide interpretation.