Background

Signs and symptoms of acute pulmonary embolism (PE) are non-specific and may range from coughing, shortness of breath, chest pain, or syncope. Given its potential severity, physicians have a low threshold for referring a patient at suspicion of PE for diagnostic imaging. As a result, the proportion of confirmed cases among those with suspected PE is low and has decreased over recent years [1]. Some authors have stressed that the threshold for referral for computed tomography pulmonary angiography (CTPA; reference standard for PE) is too low, risking contrast nephropathy and radiation-induced cancer in too many patients [2]. In addition, CTPA may detect small sub-segmental emboli, of which the clinical relevance remains unclear but are nonetheless often treated with anticoagulants, which confers a bleeding risk [3]. Rapid and accurate selection of those patients requiring CTPA is therefore of paramount importance. The current recommended diagnostic approach starts with clinical pre-test probability assessment using a validated clinical decision rule (CDR). In those with low pre-test probability, negative D-dimer testing can safely reduce the number of referrals for imaging in about 30% of suspected patients [4].

Nevertheless, it is increasingly recognized that CDRs and D-dimer testing may not be as effective and safe for all subgroups of patients. Importantly, D-dimer testing has a low specificity, meaning that the test often yields false-positive results, especially in elderly patients with comorbidity, cancer patients, and hospitalized patients, but also in younger patients with a (very) low clinical probability, or those with a history of PE [5,6,7]. To increase the specificity of D-dimer testing, it has been suggested to adapt the interpretation of the D-dimer result using a threshold adjusted to age or clinical pre-test probability [8].

Second, important differences in case-mix and healthcare settings (e.g., emergency ward, primary care, secondary care, or nursing home) exist. It has previously been demonstrated that these differences relate to PE prevalence in the suspected population, which influences the predictive performance of CDRs [4]. As a consequence, several CDRs have been developed and validated, which all have specific advantages and limitations. Typically, these CDRs are only validated within the setting in which they have been developed, yet validation studies across different healthcare settings (all with a different PE prevalence) or subgroups are limited or non-existing altogether.

Despite advances made, these issues leave clinicians with uncertainty about the appropriate diagnostic approach for a patient in a specific healthcare setting. Rather than performing a new prospective study addressing these issues, an alternative and much more convenient, less costly, and faster novel approach is to combine individual patient data (IPD) from existing studies. IPD meta-analysis is a powerful method that allows for robust model validation and updating techniques across multiple healthcare settings and subgroups [9, 10]. We recently performed such an IPD meta-analysis (IPDMA) for diagnosing deep vein thrombosis (DVT) and for evaluation of the validity of CDRs and D-dimer testing in a selected secondary care (referred) population with suspected PE [6, 11]. This paper describes the protocol of a large international IPD meta-analysis for ruling-out PE across different subgroups and healthcare settings.

Methods/design

This IPDMA will follow the guidance of Preferred Reporting Items for Systematic reviews and Meta-Analyses of Individual Participant Data (PRISMA-IPD) Statement [12]. For this protocol paper, we adhere to the guideline for Preferred Reporting items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) Statement, explained in Appendix [13].

Study eligibility criteria

Eligible studies are those that (1) have a prospective or cross-sectional design, including patients with clinically suspected PE, (2) report original data on (the method of) pre-test probability assessment and assess variables to calculate at least one prediction rule; studies only evaluating “gestalt” or an implicit pre-test probability assessment, thus without the use of an objective prediction rule, will be excluded, (3) include a clear description of the source of patient enrolment or clinical healthcare setting; studies only including children or pregnant women are not eligible for inclusion, (4) have objectively confirmed PE diagnosis with either imaging (CTPA, ventilation-perfusion lung scan or digital subtraction angiography) or clinical follow-up of at least 1 month in those initially not having received anticoagulant treatment based on the initial diagnostic testing, and (5) have at least 50 patients with confirmed PE.

Search strategy

A systematic search was conducted in MEDLINE from January 1, 1995, to August 25, 2016, using a previously developed search string for prediction development or validation studies in MEDLINE [14], combined with terms for pulmonary embolism (see the Appendix). This search string has a high sensitivity for retrieving studies developing or validating a (formal) clinical prediction models, such as a clinical pre-test probability assessment method for diagnosing PE (e.g., the Wells rule or the PERC model). No language restrictions were applied. Two reviewers (GJG and NK) independently screened titles and abstracts, and subsequently, four reviewers (GJG, NK, NvE, and FAK) independently assessed the full-text articles for eligibility. Disparate conclusions were resolved by discussion.

A total of 3145 individual studies were assessed for eligibility, leading to 40 potentially retrieved studies. The results of this literature search were discussed during a meeting at the International Society on Thrombosis and Haemostasis conference in Berlin (2017) with all principal investigators from these 40 retrieved studies. Each principal investigator checked the retrieved list of studies for completeness and data availability and was asked to suggest additional studies or sources not retrieved by the literature review, if these studies fulfilled our pre-defined eligibility criteria. This final stage of study identification and individual patient data collections is currently ongoing and will be completed early/mid 2018, including an update of our time limit (currently August 25, 2016) for the search to the most convenient recent date. Accordingly, a set of included studies with available datasets on an individual level will be constructed in order to build the final individual patient dataset, see Fig. 1 for the current flow of our search strategy.

Fig. 1
figure 1

Flowchart of included studies

The systematic review is registered in the PROSPERO database for systematic reviews (ID 89366).

Handling of missing data

To avoid bias induced by ignoring missing data in clinical research, it is widely acknowledged that (multiple) imputation techniques should be considered to replace missing values. For this IPDMA, we consider imputation (particularly) appropriate in the context of the missing at random situation, where the reason for missing values is correlated with observed values in other patients. Data can be either partially or systematically missing in a dataset. For partially missing data, traditional multiple imputation techniques will be performed per individual dataset, if not yet done by the researchers from the respective paper (in which case this imputed dataset will be used instead) and also if the proportion of missing values in relation to the total dataset is reasonably small allowing for the construction of a robust imputation model. For completely or systematically missing data, more advanced methods for imputation will be performed where appropriate by using state-of-the-art statistical techniques, preferably imputing systematically missing data with partially missing data into one imputation model [15]. All statistical analyses described below will be performed only after imputation of missing values, yet we will describe the proportion of missing values for each dataset included in our IPDMA.

Clinical decision rules under evaluation

The main advantage of clinical pre-test probability assessment is that it can be performed at the bedside. Thus, the CDR is based on readily available information on the patient’s medical history and physical examination, followed by D-dimer testing when needed. Recently, Hendriksen and colleagues performed a systematic literature review with the purpose of identifying available and easily applicable CDRs for suspected PE [16]. We aim to validate—and update if needed—these CDRs and add a novel CDR that was published after the aforementioned literature review (i.e., the YEARS rule) [17]. These clinical decision rules are summarized in Table 1.

Table 1 Clinical decision rules under evaluation

Statistical analyses and objectives addressed by this review

Although we acknowledge that with emerging knowledge and evidence, additional novel questions may arise that could be addressed within this IPDMA leading to potential amendments to this protocol. Should such amendments be necessary, these will be explained thoroughly in the respective future publications. Nevertheless, we now first aim to address the following three clinically important research domains related to the management of suspected PE:

Research domain 1: what is the optimal method for assessing clinical pre-test probability across different healthcare settings?

The optimal method of pre-test probability assessment is likely different across varying health settings due to intrinsic differences in patient characteristics and the prevalence of PE in the suspected population. For instance, in open-access emergency care, the PE prevalence is typically 5% or less compared to 20–30% for an in-hospital setting [4]. Whereas clinical decision rules with a moderate to high sensitivity but high specificity may be preferred in the first setting, only high sensitivity clinical decision rules with moderate specificity are acceptable in the second.

Studies will be categorized in the following healthcare settings, dependent on the overall PE prevalence as well as the clinical context in which the study is performed:

  1. I.

    Open-access emergency care: this setting is defined by patients presenting themselves, typically without referral, to an emergency care department. The overall prevalence of confirmed PE is 5% or less.

  2. II.

    Primary healthcare: in this setting, patients are seen on an outpatient clinic, usually by a general physician, family doctor or general internist, who needs to decide on the need for further referral, based on contextual knowledge, clinical pre-test probability assessment and D-dimer testing. The overall PE prevalence usually is 5–15%

  3. III.

    Emergency ward or hospital-care setting: this setting differs from open-access emergency care by the fact that the target population is referred based upon a clear suspicion of acute PE, usually by a family doctor or general internist. The overall prevalence usually is 15–25%.

  4. IV.

    In-hospital or nursing home setting: this setting covers both hospitalized in-patients with acute disease or after surgery, and old and frail institutionalized patients who are cared for in a long-term clinical setting. The overall prevalence of PE is typically high, e.g., > 25%.

In each setting, we aim to validate all clinical decision rules summarized in Table 1. Discrimination will be quantified using the concordance (c)-statistic and visualized by plotting a ROC curve. Calibration will be graphically illustrated in a calibration plot and quantified by assessing the calibration slope in this plot as well as calculate the expected versus observed ratio (with good calibration implying that both should be equal to, or at least approaching, 1). Finally, the diagnostic indices for each rule (sensitivity, specificity, predictive values) will be calculated using rule-specific thresholds (i.e., the distinction into “in need for further testing” versus “PE considered ruled-out”; see Table 1). The diagnostic indices that are usually reported for CDR studies investigating PE are the safety and efficiency. Safety is defined as the proportion of patients with a negative strategy (low score on CDR and normal D-dimer level) that are still diagnosed with venous thromboembolism (deep vein thrombosis or [fatal] PE) during follow-up (equivalent to 1 minus negative predictive value). It is widely appreciated that by consensus the upper bound of the 95% confidence interval of this safety proportion should not exceed 3%. Efficiency is defined as the proportion of patients in whom PE is ruled out based on a low CDR score and normal D-dimer levels (equivalent to the false negatives and true negatives, relative to all patients). Both efficiency and safety will be calculated for each CDR in each clinical setting as well. Between-study heterogeneity and clustering of data in our IPD set will be analyzed using appropriate statistical techniques using a two-stage approach, i.e., first estimate the respective diagnostic indices within each study and then meta-analyze these indices conventionally, using a bivariate random-effects approach. This bivariate approach incorporates any correlation between pairs of (logit transformed) sensitivity and specificity, or predictive values, from the studies in a random-effects meta-analysis [9, 10, 18].

Research domain 2: is the predictive performance of each CDR different in various clinically important subgroups?

We define the following clinically important subgroups: active cancer (as defined in the original publication), history of previous venous thromboembolism, inpatients, age (50+, 70+, etc.), gender, and comorbidities such as heart failure and/or COPD where available. To assess the impact of these subgroups on the predictive performance of each CDR, a logistic model will be fitted for each CDR, including with and without (dichotomized) results of D-dimer testing in our dataset. Hereto, the original intercept and regression coefficients will be used, or if not available, the total score of the respective CDR. As such, for each patient included in the IPDMA, a predicted probability of PE is estimated using both the predictors from the CDR and thus with and without results from D-dimer testing. Next, we will perform one-stage meta-analysis with a study-wise intercept term (i.e., fixed effects), the logit of the estimated predicted probability (or risk) as an offset term (i.e., no regression coefficient is estimated for term), and the subgroup covariate as a random effect. If the regression coefficient for this subgroup covariate yields a clinically plausible and statistically significant effect (p value arbitrarily 0.10 to 0.15), the conclusion will be that the respective CDR is not well calibrated for this subgroup of patients. In this context, further subgroup effects for this group of patients need to be explored, first by changing the logit of predicted risk from an offset term to a random effect term to check if the respective CDR on average calibrates well in our IPDMA (i.e., the mean slope of this covariate should at least approach 1; essentially, this has also been tested under research domain 1). Finally, to further quantify subgroup effects, this model is expanded using interaction terms of our pre-defined subgroups with (logit) of predicted risk (random effect) [10]. With these models, the mean estimated probability of PE will be estimated for each CDR score separately for each subgroup variable. To illustrate potential heterogeneity, a 95% prediction interval (PI) is calculated for these thus estimated PE probabilities. This 95% prediction interval can be seen as the range of possible PE probabilities for each CDR score, plus the presence or absence of the respective subgroup covariate. As such, wide 95% PIs can be considered as an indication of heterogeneity, warranting further exploration of its associated causes. As a first step, we will then repeat the above-described analyses in more homogenous populations, i.e., those described under research domain 1 (i.e., open-access emergency care, primary healthcare, referred hospital care, and institutionalized patients). If this indeed leads to less wide 95% PIs, observed heterogeneity can be explained by differences in baseline risk.

Research domain 3: can the efficiency and safety of ruling out PE across a broad spectrum of patients be improved with a new clinical decision model that combines clinical items with quantitative D-dimer testing?

Historically, the development of CDRs has been aimed at the derivation of simple scores, since they are meant to be calculated at the bedside to rapidly determine which patients should be referred for D-dimer testing or imaging. Therefore, most clinical decision rules consist of about six to eight items, which are traditionally assigned rounded points based on the regression coefficient from a multivariate model. For the sake of simplicity, continuous variables are always dichotomized and potential interaction between items is ignored. In addition, the derivation of most scores did not follow the methodological principles that are nowadays recommended, such as the use of (multilevel) multiple imputations, bootstrapping, and shrinkage. Furthermore, D-dimer testing is often not modeled within the underlying logistic model. Contrastingly, a two-step approach is used: if a clinical decision rule result indicates a low probability of PE, negative D-dimer testing is used to select those patients in whom imaging can be safely withheld. In this setting, various D-dimer thresholds have been proposed: the conventional, fixed threshold, an age-adjusted threshold, and a threshold dependent on the clinical pre-test probability. Although these thresholds appear to be safe in excluding PE, they are all used di- or trichotomously after clinical pre-test probability has been assessed, thereby ignoring the full predictive value of the quantitative D-dimer result; it is well known, e.g., that higher D-dimer levels are associated with a higher PE probability.

To overcome the methodological limitations of the present clinical decision rules and improve PE risk prediction, we aim to derive a new clinical decision model consisting of both clinical items as well the quantitative D-dimer result. An IPD dataset provides an excellent framework for this purpose. The increasing use of smartphone applications and websites for the calculation of risk scores provides several advantages: continuous variables can be used without dichotomization, interactions between variables can be assessed, and risk prediction can be tailored to the healthcare setting or known disease prevalence. Moreover, it will be possible to provide an absolute, individualized PE probability rather than a probability range.

Hereto, for such a full clinical decision model, various well-known risk factors for PE, signs and symptoms of PE, and the quantitative D-dimer result will be considered in an overall multilevel, multivariable logistic regression model. Continuous variables will be transformed if appropriate and clinically plausible interactions will be explored (e.g., active cancer with age, D-dimer with age, and D-dimer with gender). Variables in the final model will be selected using stepwise backward selection. Bootstrapping techniques will be used to internally validate the model and shrink coefficients accordingly (if needed). Diagnostic performance of this updated CDR will be evaluated using traditional statistical approaches, similar as those described with research domain 1 (c-index, calibration, diagnostic indices). Finally, we intend to internally validate the new model separately in each of the existing datasets using internal-external cross-validation techniques [19]. With this technique, the new model will be derived in the total IPDMA set while iteratively excluding one dataset in which the model is subsequently validated. Thus, multiple derivation models are fit and next validated. Next, model performance is explored in each validation set separately by assessing both discrimination of the (respective) derived model (c-statistic) and calibration (expected versus observed ratio, and—graphically—the calibration slope in a calibration plot). Ideally, all thus performed model validations should perform well in each validation set, thereby providing proof that the full IPDMA set can be used in total for model derivation. Should scenarios unfold where model validation is poor in one or more validation samples, this implies that generalizability of the derived model cannot be guaranteed across all patient populations, either due to heterogeneity in baseline risk (i.e., the intercept of the model) or heterogeneity across predictor-outcome associations (or both). In this situation, we will (clinically and statistically) explore model derivation in more homogenous datasets as included in our IPDMA and subsequently explain to what patient populations the thus derived new model may be (or may not be) applicable and suitable for subsequent validation studies in newly derived prospective datasets in new studies.

Risk of bias assessment

No formal risk of bias assessment tool currently exists for scoring the risk of bias in prediction model studies. However, at recent meetings of the Cochrane Collaboration, the so-called PROBAST tool (Prediction model study Risk Of Bias ASsessment Tool) is presented, but is not yet formally published. The CHARMS guideline that is developed for framing the review question for systematic reviews of prediction model studies, and for guiding the data extraction and critical appraisal of primary prediction model studies, provides guidance on risk of bias in these particular studies as well [20]. As such, we will use the CHARMS guideline in combination with preliminary version of the PROBAST tool to construct a checklist for the risk of bias assessment of the selected studies for this IPDMA.

Discussion

Pulmonary embolism is a major healthcare burden and remains a diagnostic challenge given its often non-specific clinical presentation and varying performance of the currently recommended diagnostic strategies across different healthcare settings, patient characteristics, and comorbidity. Physicians have since long been struggling with this clinical conundrum. This IPDMA will address these issues and aims at diagnostic assessment tailored to different health care settings and to individual patients. Currently, we are in the final phase of building our dataset with a dedicated group of expert investigators worldwide. We expect to publish our first results late 2018 or early 2019.