Introduction

Coronavirus disease 2019 (COVID-19) is an emerging disease linked to a new virus, the Severe Acute Respiratory Syndrome CoronaVirus 2 (SARS-CoV-2), which quickly spreads across the world and is now responsible for a global pandemic. The public health impact of the disease is considerable. More than 102 million cases of infections are currently recorded worldwide, with hospitalization rates reaching 18% depending on the age groups [1], leading to more than 2 million deaths [2] and to significant incremental costs to the healthcare system. In addition, collateral damage through an increase in non-communicable diseases, dramatically increases the public health burden associated with COVID-19 [3]. It is, therefore, a public health emergency, as declared by the World Health Organization. In view of these findings, it is obvious that there is significant pressure on hospital health system resources. However, hospitals are actually filled with confirmed COVID-19 patients who need intensive medical care but also with suspected COVID-19 cases who do not necessarily require to be hospitalized. It is therefore essential to quickly and efficiently determine whether a patient is likely to be positive for COVID-19 right after being admitted in a healthcare center to optimize triage and limit the risk of infections contracted inside the hospital. During the first wave, 12.5% of COVID-19 cases observed in British and Italian hospitals have happened to be nosocomial infections [4].

Despite a certain rate of false-negative results [5], the gold standard is the real-time Reverse Transcription-Polymerase Chain Reaction (RT-PCR) test for etiological diagnosis of COVID-19 cases. However, the estimated time to obtain the results of this test is often more than 24 h due to laboratory work overload, making direct triage of patients at high risk of infection difficult. COVID-19 history, clinical judgment and routine clinical investigations (e.g., such as laboratory data and radiological images) could therefore prove to be of great help to promptly identify patients at high risk of suffering from COVID-19.

Different diagnostic prediction models have been published in the scientific literature. According to Wynants et al. [6], there are some 33 diagnostic models for predicting COVID-19. Among these, only the model by Jehi et al. was identified as promising to be further validated. Other models were assessed as poorly reported with high or unclear Risk of Bias (RoB) in terms of the representativeness of population and lack of external or internal validation. However, according to Brown et al. [7] and subsequently confirmed by Wynants et al. [6], an overall RoB was assessed rather than the applicability of the predictors because the review aimed to document all the available COVID-19 related prediction models, not to answer specific review questions.

In order to evaluate the extent to which COVID-19 prediction models can be applicable in clinical practice, the present systematic review is designed with a more focused research question. Our objective is, therefore, to carry out a systematic review of the most recent literature to identify, evaluate and compare existing models that could prove to be of added value for clinicians in a hospital setting to sort and categorize adult patients at high risk of COVID-19.

Methods

To ensure quality and transparence of reporting in this systematic review, we followed the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) guidelines [8]. A protocol has been previously registered at the PROSPERO International Prospective Register of Systematic Reviews website (under the reference CRD42021230975). Our issue of interest was first identified and defined using the following PICOS strategy: Population - All participants aged 18 years old and over, with suspected COVID-19 infection in health care settings; Intervention/Exposure – Participants have to be evaluated for COVID-19 infection; Comparator - Not applicable; Outcome – diagnosis of COVID-19 using RT-PCR; Study design – Diagnostic prediction models. Our goal was therefore to systematically search and summarize the studies presenting diagnostic models of COVID-19 in adults with suspected COVID-19 infection in health care settings.

Literature search strategies

The search began on January 4, 2021, applying a specific search strategy on MEDLINE (via Ovid) and Scopus (via Elsevier). Additionally, we manually searched the reference list of the included studies and Google Scholar search engine. An update has been carried out on February 26, 2021. The search strategy consisted of two key concepts: (1) COVID-19 and (2) prediction models for diagnosis. The whole search strategy is available in Supplementary Material.

Inclusion and exclusion criteria

We developed prespecified eligibility criteria to determine the inclusion of abstracts and articles. Briefly, we included peer-reviewed original studies presenting prediction models for diagnosis of COVID-19 in adults from health care departments. The prognostic models aiming to evaluate the evolution of the COVID-19, and thus not meeting our research objective, were excluded. English language was required (restriction not impacting the quality of the systematic review [9]), and the publication date considered was from May 1, 2020 to January 4, 2021, with an update performed on February 26, 2021. Published studies meeting our inclusion criteria before May 1, 2020 were identified by means of a previously published systematic review of Wynants et al. in the British Medical Journal because it partially covering our topic [6]. Table 1 provides the whole inclusion and exclusion criteria applied to meet our objective.

Table 1 Eligibility criteria of references to be included in the systematic review of prediction models to diagnose COVID-19 from the start of the epidemic to March 2021

Selection of studies

Titles and abstracts of references identified by the search strategy were independently reviewed by 3 reviewers (AD, CB and ML) according to the aforementioned eligibility criteria. The full-text review stage followed, and eligible studies were identified by 2 reviewers (AD and ML). At each stage, if disagreements occurred, they were resolved by discussion between reviewers with the intervention of a third peer (CB) if needed, to arbitrate in final inclusion. This entire procedure was performed using the covidence® systematic review management software recommended by Cochrane collaboration.

Data extraction

A standardized data extraction form has been developed. A pre-pilot extraction of a first reference was carried out to assess the relevance of the data extraction form. Then, relevant data were extracted independently by 2 reviewers (AD and ML) and discrepancies were resolved through discussion with the help of a third peer (CB) if necessary. The extraction has been performed according to the CHecklist for critical Appraisal and data extraction for systematic Reviews of prediction Modelling Studies (CHARMS) [10]. Different data of interest have thus been collected: manuscript general information, population description, characterization of predictors and outcome, model development, model performance, results and conclusions.

Risk of bias

Two reviewers (AD and ML) assessed the quality and RoB of each included study using the Prediction model study Risk Of Bias Assessment Tool (PROBAST) [11], designed specifically for systematic reviews of diagnostic prediction models. The RoB has been assessed through questions regarding several domains: participants, predictors, outcome, analysis. The questions were answered by “high risk of bias”, “unclear risk of bias”, “low risk of bias”. Any conflict has been resolved through consensus, optionally with the intervention of a third party (CB).

Data synthesis

A descriptive analysis of the included studies was performed under the format of a narrative report. Results were structured according to a primary description of the general characteristics of the included studies, followed by the model development and model performance to conclude with a comprehensive description of the final model.

Results

Of 1843 abstracts reviewed, we identified 65 articles for full-text screening. Subsequently, 13 of these met eligibility criteria as shown in Fig. 1 [12,13,14,15,16,17,18,19,20,21,22,23,24].

Fig. 1
figure 1

PRISMA flow diagram for the inclusion of prediction models to diagnose COVID-19 developed during the pandemic until March 2021

Supplementary material offers a listing of all excluded refences (n = 52) at the full-text screening stage, as recommended by AMSTAR2 [25].

The general characteristics of each included study are described in Table 2. All the studies were performed from 2020, and all around the world.

Table 2 General characteristics of the included studies in the systematic review of prediction models to diagnose COVID-19 from the start of the epidemic to March 2021

Each study proposed diagnostic models for COVID-19 based on socio-demographics, clinical symptoms, blood tests, or other characteristics which were compared to the gold standard diagnosis, namely the RT-PCR test. A detailed presentation of the different models developed is available in Table 3. The population consisted of adults’ patients who presented to hospital because of suspicion of COVID-19.

Table 3 Description of the different diagnostic models for COVID-19 in the systematic review of prediction models to diagnose COVID-19 from the start of the epidemic to March 2021

For all the models developed, the gold standard was the RT-PCR (the study by Tordjam et al. [14] added also CT-scan as a gold standard). Numerous predictors of COVID-19 disease were candidate for the final model: at minimum 8 candidate predictors for the model of Kurstjens et al. [21], with a maximum of 71 in the study of Huang et al. [20]. The sample size varied from one study to another (i.e., from 100 participants for the studies of Bar et al. [13] and Vieceli et al. [15] to 172,754 for the study of Plante et al. [24]). The research team of Plante [24] was the only one to proceed to an external validation.

Most of the models were developed using logistic regressions. From this logistic regression, some developed a score from which it was possible to highlight the increased probability of suffering from COVID-19. Models such as XGBoost [14, 22, 24], random forest [22] and machine learning [17] were also applied. Three models did not offer the possibility to be quickly and easily applicable by the clinician, because not presenting scores: Bar et al [13], Fink et al [18], and McDonald et al [22], which limits the applicability of these models in practice.

The predictors proposed and analyzed by the different researchers were very diverse; however, we can observe a certain recurrence for certain predictors. The presence of fever appeared in 7 models, the blood value of eosinophils in 6 models, and CRP in 5 models. Four studies inserted comorbidities, gender (male) or chest X-ray as a predictor in their models. Finally, age, cough, WBC were significant predictors in three out of 13 studies and lymphocytes was present in two out of the 13 studies.

Moreover, to distinguish predictors directly available in an acute phase and in a non-acute phase is essential. Predictors directly available during the clinical examination are the most widely used: exposure risk, clinical signs, symptoms, clinical history, socio-demographic data, comorbidities, vital signs, contact tracing, respiratory failure, differential diagnosis. However, laboratory test and imaging need a waiting period to obtain results. All the models presented here mixed both type of predictors with the exception of Callejon-Leblic et al [17] which require only demographic and clinical data as well as symptoms.

While all models appeared to be successful in identifying patients at high risk for COVID-19, their performances were not similar. A summary of these is presented in Table 4.

Table 4 Diagnostic performance of the different included models in the systematic review of prediction models to diagnose COVID-19 from the start of the epidemic to March 2021

The sensitivity and specificity varied greatly between the different models but also according to the possible differences in score results offered by certain models. The Kurstjens et al. [21] model seemed to offer the best sensitivity (98%), while the Gupta-Wright et al. [19] model seems to offer the best specificity (96.1%).

All the models performed well to discriminate people with or without COVID-19 (all AUROCs> 0.500). The lowest AUROC was observed for the model of Aldobyany [12] (i.e., an AUROC of 0.600) and the best AUROC was that of the model of Kurstjens et al. [21] (i.e., an AUROC of 0.940).

RoB has been assessed by two independent reviewers for each individual studies (see Fig. 2). In general, the RoB was considered low. However, participants selection was at high RoB in one study (McDonald et al. [22]) due to difficulties to estimate the appropriateness of the inclusion and exclusion criteria and the data source. High RoB regarding the outcome was estimated for the model of Fink et al. [18] because of information about the outcome were lacking. Finally, analysis could engender a RoB in the studies of Nakakubo et al. [23] and Vieceli et al. [15] because they did not respond to some pre-determined criteria (e.g., statistical selection of predictors, overfitting in model performance not accounted, consideration of all enrolled participants,…). In Fig. 2, it was considered unclear when the paper did not provide enough information to assess the RoB [13, 15].

Fig. 2
figure 2

Risk of Bias for each individual study in the systematic review of prediction models to diagnose COVID-19 from the start of the epidemic to March 2021 using the PROBAST tool

Discussion

Model identification

Our systematic review of the literature identified 13 studies presenting models for predicting patients at high risks of COVID-19 from those likely not presenting with COVID-19.When limiting the target population to COVID-19 suspected patients in hospital settings in contrast of the study of Wynants et al. [6], the results indicated that the prominent predictors were similar to those found in the study of the whole type of population [6]., namely socio-demographics (age and gender), clinical symptoms, vital signs, laboratory or biological tests. The recurrent predictors respective to their appearance in the studies were fever (7), eosinophil (6), CRP (5), male gender (4), chest X-ray (4), (older) age (3), cough (3), WBC (3), and lymphocytes (2). The inclusion of comorbidities was observed in four out of 13 studies. Yet, comorbidities were of little predictive value for COVID-19 positivity in the present study and in Wynants et al. (2020); only obesity, chronic renal failure and CAD/heart failure were found to be significant predictors. In this regard, Vieceli et al. [15] postulated that patients with more comorbidities could have less exposure risk due to their low travelling possibilities.

Model comparison

The authors employed various methods to develop their models such as logistic regression, score, XGBoost, machine learning with logistic regression being the most employed method. The number of predictors differed substantially, ranging from 8 to 71 variables as per study. Despite the unneglectable false-negative rate, RT-PCR was adopted as the gold standard in all included studies, except in Tordjman et al. [14] where both RT-PCR and CT-scan were used.

The final resulted models were either simple with as few variables as four (McDonald et al. [22]) or complex with 15 variables (Plante et al. [22]). To advance practical uses, authors also proposed scores [12, 16, 19,20,21, 23] which eased the calculation process and facilitate clinical use. The nature of the predictors varied across studies (i.e., the included models incorporated different combinations of socio-demographics (age and gender), clinical symptoms, vital signs, laboratory or biological tests). This has an implication in terms of applicability in clinical practices. If the models are to be used to triage patients at first admission, those variables that can be immediately collected such as age, gender, symptoms or tracing contact can be more useful than laboratory or biological tests. Otherwise, the latter are more appropriate for research purposes aimed at the medical/biological aspects of the COVID-19.

When the same independent variables were used, it was noticed that there were also differences in the measurement and threshold used. For example, the cut-off values for age and biological indicators like lymphocytes or CRP (mg/L) varied across models. In this respect, the generalization of the models to a different context is not plausible given that standardized thresholds are not available. Furthermore, in some cases, the collected variables were country-specific like the qSOFA (Bar et al. [11]) and cannot be obtained if the model is to be put into use in a setting other than the research context.

With the discussed heterogeneity in terms of variable selection and measurement, a straight-forward implementation of a given model included in this systematic review is not recommended. Although we have applied rigorous criteria in terms of (adult) population, (hospital) setting, and design and population, i.e., consisting of only COVID-19 suspected patients and removing interventional studies, the confidence in specifying a model deemed most promising for clinical use requires further investigation. However, on the basis of the findings, future studies could make use of the most recurrent significant predictors to further develop or refine existing prediction models.

Model development, performance, and evaluation

Among the included studies, the sample size ranged from less than 200 (Bar et al. [11], Nakakubo et al. [23], and Vieceli et al. [15]) to as high as 192,779 (Plante et al. [22]). In the case of small or medium sample size, the justification whether the sample size was adequate for the tested prediction model was not reported. All studies made use of convenience sampling rather than attempting to realize more rigorous method such as stratified sampling techniques.

Most studies were conducted at a single site or institution, except those from Gupta-Wright et al. [17], and Plante et al. [22]. The latter was a large-scale study which involved data of blood tests from 66 US hospitals. When validation was taken into account, all of the studies (Kurstjens et al. [19], Tordjman et al. [23], and Sung et al. [14], Gupta-Wright et al. [17]) employed a development and validation cohorts or bootstrapping, hence internal validation was the most frequently used method. An exception was that of Plante et al. [22] where external validation was feasible due to scale of the study. The finding echoes the observation from Wynant et al. such that overfitting and low predictive power in a new context might be a concern. In this regard, future studies can make use of recommendations for sample size estimation and validation from Riley et al. to achieve more robust predictive models.

To evaluate the model, all studies reported sensitivity and specificity and/or NPV and PPV as the main discriminative indicators, except that from Nakakubo et al. [21] in which no such information was available. No calibration plot was reported in the included studies, which limits insights into the extent to which the developed models are calibrated. All of these models are effective in discriminating patients according to the probability of COVID-19 infection (all AUROCs> 0.500). Therefore, to an extent, these models could demonstrate their discriminative effectiveness.

Focusing on the target population of COVID-19 suspected adult patients in hospital settings only, the systematic review attempted to select studies with a clear description of the patient characteristics and context. Using this criteria, one study (McDonald et al. [20]) was deemed as pertaining a bias in the participant selection because, due to the decision of the health care systems, no asymptomatic testing was performed. The bias in terms of predictors was mostly low, except in Bar et al. [11] and Vieceli et al. [24] in which the risk was unclear due to the exclusion of cases with low quality ultrasound scan. Low bias regarding the outcome was observed in most studies, which was expected due to the use of a common gold standard of RT-PCR. In Fink et al. [13], there was a potential risk because only the results obtained within 72 h were considered eligible. As regards data analysis methods, the models from Nakakubo et al. [21] and Vieceli et al. [24] were more likely to be biased. In the former, the authors assigned arbitrary risk scores based on the univariate results from Fisher exact test, t-test or Mann-Whitney test. In the latter, the bias could be attributable to the small sample size (n = 100) and a high number of predictors of 43.

Implications for research and practices

The systematic review has identified 13 models worth considering when suspected COVID-19 adult patients in a hospital setting are the target population. Based on the findings, the following implications could be helpful for research and practices.

First of all, the findings revealed recurrent significant predictors which are in accordance with the latest (and living) systematic literature by Wynants et al. When only the positivity of COVID-19 is of interest rather than disease progression or severity, comorbidities might be less important than socio-demographics (age and gender), clinical symptoms, vital signs, laboratory or biological tests. Therefore, future studies could rely on the identified groups of predictors to refine COVID-19 prediction models and further clarify the role of comorbidities regarding COVID-19 infection probabilities.

Second, as can be seen, there are different groups of variables collected in the presented models in this systematic review. Variables could be immediately collected at first screening included socio-demographics (age and gender), clinical symptoms, or tracing contact while others such as vital signs, laboratory or biological tests and chest X-rays are only available at a later stage of the admission process. Therefore, when testing facilities are limited and early screening is essential to adequately and rapidly triage patient flows to prevent nosocomial transmission, the clinicians could make use of age and gender as well as the significant clinical symptoms identified at first screening. In case an immediate level of risk is resulted rather than a high or a low-risk category, perhaps it would be more favored to promote over-triage and subsequently perform further examinations to confirm COVID-19 positivity. In so doing, either testing resources could be effectively employed or false negatives could be minimized to the least extent.

Third, a narrative comparison of the 13 models revealed more heterogeneity in terms of research design, model complexity, variable selection and measurement, model development methods and statistical reporting. While posing challenges for the applicability of these predictive models in practice, future research could capitalize on these existing models to better develop or refine predictive models in the specific hospital setting. Another research venture is to examine the extent to which the potential models, i.e., those with adequate sample sizes and sound methodologies, could yield comparable predictive results using agreement indexes. External validation of the potential models using standardized and multi-centered datasets is definitely valuable to advance our knowledge of the critical predictors of COVID-19 infection. In so doing, measures to mitigate community and nosocomial infections could be more efficient.

Fourth, despite the fact that biases were inherent in all models included in this systematic review, it is believed that the biases are sometimes beyond scientific decisions, particularly when they were developed in such a pandemic context. In other words, the complexity of the viral mutation, the pressure of the healthcare systems when facing the exponential growth of the viral transmission, and the availability of testing infrastructures have induced certain constraints for efforts to ensure research integrity. Therefore, instead of viewing the biases as limitations regarding model application in practice, it is advised that policy makers, researchers, and clinicians could interpret these models as potential models in a given setting and constraints. Methodologically speaking, the application of any model in practice should undergo a thorough investigation and pilot testing phase. In this respect, findings from the present systematic review are helpful to initiate such investigations.

Our study encounters certain limitations which need to be considered, hence entailing a cautious interpretation of the results. First, two databases have been searched (Scopus and MEDLINE); however, other databases search (e.g., EMBASE) could have been optimal although not possible in our case due to logistical constraints. Second, we limited our search only to published studies. Nonetheless, in the context of COVID-19, we have found it more appropriate to focus on peer-reviewed studies. Our work is set in time when we suspect that many studies could be published in the future. To do this, we have to carry out the most recent update possible. To counter this time constraint, carrying out a living systematic review would be optimal.

Conclusion

The present systematic review yielded 13 predictive models of COVID-19 for suspected adult patients admitted to a health care department. Using rigorous inclusion criteria, the selected models informed and confirmed our knowledge of the most important predictors of COVID-19. Despite the inherent biases that were inevitable, the 13 models could demonstrate their effectiveness in predicting COVID-19 positive cases employing indicators like sensitivity, specificity, PPV, NPV, and AUROC. Potential models among the given 13 models might be selected depending on the objective, the target population, and the implementation context in terms of human resources and testing infrastructure, which should be subject to further testing and refining. Future research could reliably base on the findings from this systematic review to advance current knowledge of the significant predictors of COVID-19 using RT-PCR as the gold standard across different contexts.