Background

COVID-19 has a large impact worldwide, causing over 6 million deaths [1], various long-term health effects in a large group of individuals, and an increased burden on healthcare providers and medical institutions. Health outcomes of a COVID-19 infection can be severe, particularly in the older population [2]. In the Netherlands, it is estimated that disproportionately high mortality of 88.8% of all COVID-19 deaths occurred in the older population (≥ 70 years of age) even though they make up only 14% of the total population [3]. Similarly, these mortality proportions in older individuals were also relatively high in hospitalized (60%) and nursing home settings (40%) [3].

Hundreds of prognostic models have been developed to quantify (differences in) mortality risk or other outcomes in COVID-19 patients and to identify individuals at greater risk of developing various future health outcomes [4]. COVID-19 prognostic models have been used to facilitate informed shielding decisions by governments [5], identify higher-risk groups requiring ventilatory or critical care support early to enable targeted recruitment for randomized controlled trials [6], and deliver more personalized, risk-based treatments for which effectiveness is known to vary according to disease severity more precisely [7]. However, despite the development of hundreds of prognostic models, only a few models are of high quality and low risks of bias, according to a critical appraisal in a large living systematic review [4]. For those models that were appraised at a low risk of bias, information about their actual performance in external validation studies is scarce. Further, due to the added complexity of health conditions like frailty [8] and multimorbidity [9] in older individuals, we hypothesize that these prognostic models derived for the general adult population will underperform when validated in an older population.

In this protocol, we describe a comprehensive external validation study to evaluate the predictive performance of eight prognostic models in the older population, defined as individuals aged 70 and older, in hospital, primary care, and nursing home settings. The predictive performance of the prognostic models will be evaluated in a different population than they were derived on, being the older population of individuals aged 70 and older compared to the general adult population. One model will be evaluated in all healthcare settings (hospital care, primary care, and nursing home) to assess predictive performance across settings.

Methods

We have adhered to the TRIPOD guidelines checklist for external validation studies [10] in reporting this study protocol (Supplementary file 4).

Selection of COVID-19 prognostic models

In the living systematic review of diagnostic and prognostic prediction models for COVID-19 (www.covid-precise.org) [4], all published prediction models were reviewed using PROBAST (www.probast.org) [11, 12], a quality or risk of a bias assessment tool for prediction model studies. Using results from the fifth update of this review, we have identified all candidate prognostic models that predict the risk of mortality in individuals with COVID-19 infection with uncertain or low risk of bias. Fifteen candidate models met this criterion of which five prediction models (PRIEST [13], CUCAF-SF [14], CUCA-SF [14], and QCOVID [15] for males and females) were not included for validation due to the unavailability of data on certain predictors in the six cohorts of older patients while two prognostic scores (qSOFA [16] and NEWS [17]) were excluded because they express risk of mortality qualitatively rather than as a risk prediction (Supplementary file 1). Eight prognostic models were included for external validation (Fig. 1). Of these eight models, five were COVID-19-specific (GAL COVID-19 mortality model [18], 4C Mortality Score [19], NEWS2 + model [20], Xie model [21], and Wang clinical model [22]) and were developed in adult COVID-19 populations during the pandemic. Three prognostic models were already existing before COVID-19 pandemic and were used for the prediction of in-hospital mortality risk after admission for any respiratory infections or sepsis (APACHE-II [23], CURB65 [24], and SOFA [25]) (Table 1). The details of eight selected models can be found in Supplementary file 2.

Fig. 1
figure 1

Flowchart for inclusion of prognostic models for external validation

Table 1 Overview of selected prognostic models

Validation cohorts

Data for this external validation study is collected from six cohorts with older individuals presenting with COVID-19 infection in the Netherlands from three settings: a hospital setting (3 cohorts), primary care setting (2 cohorts), and nursing home setting (1 cohort) (Table 2).

Table 2 Details of study cohorts

Participants

The study participants consist of older individuals (≥ 70 years of age) presenting with highly suspected or reverse transcription polymerase chain reaction (RT-PCR) confirmed COVID-19 from March 2020 to December 2020 in hospital, primary care, and nursing home settings.

Before the widespread use of the RT-PCR test for COVID-19 diagnosis, participants were included using proxy criteria. In the hospital cohorts (CliniCo, COVID-OLD, COVID-PREDICT), the reported respiratory diseases that had COVID-like symptomology are used as an inclusion criterion until 31 March 2020. From April 2020 onwards, a confirmed RT-PCR test for COVID-19 is used as an inclusion criterion. In the primary care cohorts (PHARMO and JHN/ANH/AHA), the participants are included based on free text information and reports of respiratory infections. From June 2020 onwards, ICPC R83.03 for COVID-19 infection is used as an inclusion criterion. The nursing home cohort (Ysis) used RT-PCR as an inclusion criterion. Only participant data on the first presentation or admission of COVID-19 will be included. In the three hospital cohorts, admissions with a duration fewer than 7 days between discharge and readmission will be considered a single hospital admission.

Outcome

All prediction models have mortality as the predicted outcome (Table 1). In all three hospital cohorts, the outcome is defined as in-hospital mortality. In the primary care and nursing home cohorts, the outcome is defined as 28-day mortality.

Predictors

Definitions and timing of the predictor variables for the eight models were extracted from original publications (Table 1). Recorded variables in the cohorts are matched, as closely as possible, to the original predictor measurement procedures (Supplementary file 3).

Statistical analysis

We will externally validate the eight COVID-19 prognostic models in the six cohorts of older patients with COVID-19, aiming to assess their predictive performance when transported from a general adult population to a specific older population. The performance of the GAL-COVID-19 model is assessed across the three healthcare settings. The GAL-COVID-19 mortality model was developed in a primary care setting (general practitioners) and will be validated across different settings (in hospitals, primary care, and nursing homes) [26]. The 4C Mortality Score, NEWS2 + model, Xie model, Wang clinical model, APACHE-II score, CURB-65 score, and SOFA score were developed in hospitalized populations and will be externally validated in the same setting. Evaluation and assessment of the predictive performance of the COVID-19 prognostic models will be performed in each cohort separately. The statistical analysis will be performed in R (version 4.0.0 or later) [27].

Descriptive analysis

Participant characteristics and predictor information will be described in all study cohorts (overall and stratified on mortality outcome status) to identify differences in case-mix between the development and validation study populations [28]. These comparisons give insight into the expected model performance and transportability [26].

Missing data

The missing data will be described to determine possible reasons for and patterns in missingness [29]. Based on these findings, decisions about the handling of the missing values in the statistical analysis will be made. We anticipate that missing data will be handled using multiple imputations by chained equations using the Full Conditional Specification or Joint Modelling (JOMO) [30]. All variables and outcomes in the final prognostic models are included in the imputation model to ensure compatibility. A total of 50 imputed datasets will then be generated as cohorts are expected to have less than 50% missing values for all relevant variables [31].

Assessment of predictive performance

For each prognostic model, we will apply the model according to the authors’ original descriptions and evaluate its predictive performance. We will evaluate discrimination (the model’s ability to distinguish individuals who died after presentation with COVID-19 diagnosis from those who did not) and calibration (the agreement between predicted and observed mortality risks) in each cohort [32]. Discrimination will be assessed in all models by quantifying the area under the receiver operating characteristic curve, i.e., c-statistic [33], and pooled by taking the median c-statistic over imputed datasets and computing dispersion using the interquartile range [34].

Calibration will be assessed by visualizing the calibration of expected versus observed risk using LOESS-smoothed plots on stacked imputed data sets [35]. The GAL-COVID-19 mortality model, NEWS2 + model, Xie model, and Wang clinical model are model equations. For these models, calibration will be assessed in terms of the calibration-in-the-large coefficient and calibration slope [35]. The coefficients are again pooled on a log scale using Rubin’s rules in case of multiple imputed datasets [36]. For each performance measure, for each evaluated model, we will compute the point estimate, standard error, and 95% confidence interval.

Decision curve analysis

Decision curve analyses will be performed to quantify the net benefit achieved by each model for predicting the originally intended endpoint across a range of risk thresholds ranging from zero to one [37].

Updating

Prediction models showing miscalibration will be adjusted using an intercept update, and predictive performance will be re-assessed for the recalibrated model.

Sensitivity analysis

Two sensitivity analyses will be performed to assess the variation in the predictive performance of the eight COVID-19 prognostic models when implemented in different time periods: January 2021 to December 2021 and March 2020 to December 2021. Additionally, predictive performance in estimating the 90-day mortality risk will be evaluated in cohorts that have data available on this outcome (CliniCo, COVID-PREDICT, PHARMO, JHN/ANH/AHA, Ysis).

Sample size

Using national statistics from July 2020 to December 2021, the COVID-related mortality fraction for the older population (≥ 70 years) living at home in the Netherlands (including self-reported COVID-19-positive patients) was around 3% [3]. Although we expect higher mortality in nursing homes, hospitals, and ICU admissions, we will assume that an outcome incidence of 3% is the lowest event fraction that we will encounter for the current prediction model validation study. We take this fraction as a starting point for the sample size calculations, which are based on sample size calculation recommendations for external validation studies by Riley and colleagues [38]. Since we are considering various models, we base our sample size calculation around a general scenario that is reasonably applicable to all cases. Based on the distributions of predicted risk of mortality that were previously reported for some of the cohorts [8, 39], we assume the distribution of the linear predictor to be approximately N (− 3.9,1) [38]. Based on the calculations, a sample size of 754 will be required per cohort to validate an assumably well-calibrated model with a calibration-in-the-large coefficient of 0 and calibration slope of 1, and an assumed target standard error of calibration slope of 0.02. The target standard error of the calibration slope was chosen to ensure the evaluation of calibration with a higher precision than the typically used value of 0.05.

Discussion

This external validation study will assess the predictive performance of pre-existing clinical and COVID-19-specific prognostic models in the older population. Our study will reveal the validity of these COVID-19 prognostic models in one of the most vulnerable populations, the older population, as well as give insight into the requirements for tailored prediction models for the older population in future COVID-19 waves or pandemics.

While previous studies have largely focused on external validation of COVID-19 prediction models in the general adult population [14, 40, 41], this external validation study will focus on evaluating the performance of existing COVID-19 prognostic models specifically in the older population which represents the highest proportion of hospitalized COVID-19 patients and the highest mortality [3].

External validation in multiple cohorts across different healthcare settings allows for the assessment of between-setting heterogeneity in predictive performance and applicability at different points of care in older COVID-19 patients. This is one of the first studies to evaluate prognostic models for COVID-19 disease in individuals living in nursing homes.

Challenges and limitations

There are certain challenges that we anticipate encountering while conducting this research. Definitions of participant inclusion and predictor measurement procedures are expected to differ across cohorts. Although we have carefully planned to collect information on the definition of a COVID-19 infection and predictor measurements for each cohort, we cannot rule out the possibility of heterogeneity occurring across different settings and its effects on model performance [42]. However, this heterogeneity resembles practical conditions encountered in clinical practice and thus provides relevant knowledge on the anticipated performance of the models in clinical practice.

A limitation of this study is that not all low risk of bias COVID-19 prognostic models identified by the living systematic review could be included in this external validation. This was due to the lack of predictor information available in the cohorts. Similarly, predictor measurements and incidence of mortality due to COVID-19 can vary over time due to newer variants, better medical management, and improved vaccination coverage [43]. These temporal changes can potentially limit the predictive performance of the prognostic models being investigated in the current study.

Conclusion

External validation of newer and existing COVID-19 prognostic models can provide evidence about their predictive performance when implemented in the older population as tools of effective risk stratification and in aiding decision-making for targeted and timely clinical interventions.