Introduction

In late November 2019 in Wuhan, China, the first cases of a novel coronavirus, termed by the International Committee on Taxonomy of Viruses as severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), emerged and rapidly spread across China and around the globe [1,2,3]. The SARS-CoV-2 virus, which causes coronavirus disease-2019 (COVID-19), can cause a wide spectrum of life-threatening symptoms including upper respiratory infection, pneumonia, encephalitis, myocarditis, liver failure, and kidney failure [4,5,6].

Over the past year, various observational studies and clinical trials worldwide have focused on investigating strategies for COVID-19 control and treatment [7,8,9]. One actively researched topic centered on identifying factors of disease severity and death, from baseline demographic data such as race, age, and gender, to clinical measurements such as laboratory and radiographic parameters [10,11,12,13].

In this scoping review, we review empirical studies investigating the correlation between initial laboratory values with mortality and disease severity in COVID-19 published through 10th August, 2020. We account for the change in which laboratory markers were profiled over the time course of the current pandemic, which will inform the medical community about potentially predictive laboratory parameters for adverse clinical outcomes. Moreover, this study will be useful for focused quantitative meta-analyses of laboratory values correlated with mortality and disease severity, as well as identifying low-studied, high-significance laboratory values that merit further primary analysis.

Material and methods

This scoping review adhered to the staging framework as presented by Arksey and O’Malley [14]. Using this framework, we aim to identify research gaps in existing literature for focused meta-analyses in specific laboratory parameters tied with clinical outcomes in COVID-19.

Stage 1: identifying the research question(s)

The research questions that guided this COVID-19 scoping review centered on the laboratory parameters collected upon initial hospital presentation. Specifically, what are the laboratory values reported in the literature among COVID-19 patients? Which laboratory values are reported with primary clinical outcomes including mortality, disease severity, and clinical disease progression? How do the reported laboratory values change over the initial course of the pandemic?

Stage 2: identifying relevant studies

This scoping review was developed following the Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) Checklist [15].

A comprehensive literature search investigating laboratory findings among COVID-19 patients was developed and performed by experienced medical librarians [KM, CJ], upon consultation with lead investigator and project team [JC, AZ, GZ]. The initial search was performed on 9th April, 2020 via OVID MEDLINE® ALL (1946–8th April, 2020). This initial search was translated and re-run via OVID EMBASE (1974–9th April, 2020), The Cochrane Library (Cochrane Database of Systematic Reviews, Cochrane Central Register of Controlled Trials (CENTRAL), Cochrane Methodology Register, Technology Assessments (HTA)), Scopus (Elsevier B.V.), and medRxiv (https://www.medrxiv.org/) on 10th April, 2020. Search terms were clustered in the following manner:

  • (1) The novel Coronavirus or coronavirus disease 2019 (i.e., COVID-19, SARS-nCoV, 2019 Coronavirus)

  • (2) Clinical laboratory test or physiologic component of interest, consisting of the following general categories: BMP Panel (i.e., Na, K, Cl); Hepatic function (i.e., Total protein, Albumin, Prealbumin); Arterial Blood-gas (i.e., PaO2, “PaO2:FiO2”); Hematology (i.e., White blood count, Neutrophil count, Lymphocyte count); Immunologic (i.e., CD3, CD4, CD8); Inflammatory (i.e., IL2R, IL7, IL8); Other (i.e., Angiotensin II, C3)

Weekly search updates were run in each database to ensure the most current literature was captured, with the last search performed on 10th August, 2020. Identified articles were imported into Covidence systematic review software for title, abstract, and full-text screening [16].

Stage 3: study selection

The authors (AZ, GZ, ML, XY, JK, HK, and YC) independently screened citation titles and abstracts, after which potentially relevant articles were reviewed in full. We considered any primary research article reporting laboratory values associated with clinical outcomes of COVID-19 including mortality and disease severity. We excluded articles that focused on patients younger than 18 years or pregnant patients. Secondary papers, including other systematic reviews, were excluded to limit double-counting of data. Although medRxiv papers were initially pulled for purposes of comprehensive review, these papers were later excluded to limit duplication of data. At each step of the study selection process, two reviewers had to reach agreement before the article could proceed to the next step; if no agreement could be reached, a third reviewer provided the deciding opinion. The details of study selection are reported in the PRISMA-ScR flowchart (Fig. 1).

Fig. 1
figure 1

PRISMA flowchart of paper selection process. A total of 28,173 records were identified, from which 13,568 were screened following removal of duplicate records. The majority of these papers were excluded during title and abstract screening, leaving 1126 full-text articles assessed for eligibility. Of these papers, 597 were excluded, leaving 529 papers in this qualitative scoping review synthesis

Stage 4: charting the data

REDCap electronic data capture tools [17], hosted at the Clinical and Translational Science Center at Weill Cornell Medicine, were used by three researchers (AZ, GZ, JC) to determine the variables to extract. Researchers (AZ, GZ, YC, HK, JK, ML, and XY) subsequently updated the data charting tool. The following data items were extracted: general data (article title, journal of publication, hospital(s) of study origin, city/cities of study origin, country/countries of study origin, study start date, study end date); patient characteristics (number of patients, number of male patients, patient age); hospital course (primary clinical outcome, all other clinical outcomes); laboratory values (designated as significant or non-significant at the p = 0.05 threshold). As data extraction was implemented, the data extraction format was modified to accommodate additional laboratory parameters as needed.

Primary clinical outcomes recorded included mortality, disease severity, and progression of disease. Disease severity criteria were recorded if defined by the study, and encompassed definitions by various trials of the NHCC [18], The World Health Organization (WHO) [19], and The American Thoracic Society [20], among others. Some studies did not explicitly define disease criteria but rather presented clinical outcomes of intensive care unit (ICU) admission or mechanical ventilation treatment; these studies were also included under the grouping of disease severity. A small subset of studies investigated multiple primary clinical outcomes, most commonly a combination of mortality and disease severity. In these cases, in our analysis, we assigned mortality as the primary outcome.

We evaluated initial laboratory values provided during hospitalization. Laboratory values were generally presented by researchers as absolute serum or urine values, with a minority of papers presenting values as percentages above or below certain defined thresholds. In these cases, if both absolute (e.g., absolute lymphocyte count) and percentage values (e.g., percentage lymphocyte count above a particular threshold) were available, we used the p values associated with their respective absolute laboratory values when recording if the value was of statistical significance.

Details of included studies are presented in Supplemental Appendix 2: supplementary file 2. Our team did not perform a formal critical appraisal of primary studies as this procedure was outside of the scoping review goal.

Stage 5: reporting the results

All data collected in Stage 4 of the scoping review were collated, summarized, and reported to map out the published reports on laboratory values of COVID-19.

Results

Literature characteristics

The literature search generated 28,173 records. Of these, 14,605 records were duplicates and 12,442 were excluded for failure to meet criteria, leaving 1126 full-text articles for assessment of eligibility; afterwards, 597 full-text studies were further excluded for failure to meet criteria (Fig. 1).

Characteristics of included studies

Table 1 summarizes the major characteristics of the 529 included studies. Study size ranged from 12 to 8673 subjects, with a median of 120 patients per study. The studies involved patients from countries around the world, primarily in Asia (387 [73%]), followed by Europe (97 [18%]), North and South America (41 [8%]), and multinational or other regions of the world (4 [< 1%]). From a patient number standpoint (Supplementary Table 1), the majority of patients were from Asia (97,355 [59%]), followed by Europe (26,168 [16%]) and North and South America (33,822 [20.5%]). A total of 478 and 488 papers mentioned study start and end dates for observation of patient clinical outcomes, respectively, with 39 papers providing neither date; in aggregate, studies reported data from 21st November, 2019 to 21st June, 2020.

Table 1 Characteristics of the 529 included studies

Temporal characteristics of high-frequency laboratory parameters

The high-frequency laboratory parameters across all primary outcomes are summarized in Table 2, which presents two interrelated variables, compartmentalized by time range, for each laboratory parameter: (1) “Percentage of papers with significant results” describes the count of papers that found a laboratory value to be statistically significant divided by the count of all the papers that analyzed the value; (2) “Percentage of patients” describes the sum of the number of total patients with the measured laboratory value divided by the total number of patients among all papers published. Laboratory parameters were categorized by type: hematologic, organ function (renal, hepatic, cardiac), inflammatory, and coagulation. The heatmap colors used were created through graded color scales in Microsoft Excel. Data columns with changes (delta) were assigned color gradation from a scale of -40% to +40% with the midpoint at 0%, capturing the range of the differences, while the remainder of the data columns were assigned color gradation from a scale of 0% to 100%, with a midpoint of 50%. The colors of the heatmap table are on a spectrum, with green corresponding with larger values, yellow corresponding with intermediate values, and red corresponding with smaller values. All subsequent heatmap tables also utilized this color scale.

Table 2 Heatmap table of the temporal distribution of frequently encountered laboratory parameters in all regions. Laboratory parameters with at least 20 papers per time range were included

An observation emerged in the “percentage of papers with significant results” variable: for many of the high-frequency parameters, there was a decline in statistical significance from the earlier to later studies. For example, the most commonly reported laboratory value was lymphocytes, which appeared in 77% of published papers in both the January–March and April–June time ranges; however, the percentage of these papers that found lymphocyte count to be significant (p < 0.05) decreased from 78 to 57%, a decline of 27%. Similarly, among the other six most-studied hematologic parameters, there was a decline in percentage significance over time for leukocytes (− 25%), neutrophils (− 8%), and platelets (− 25%). Among the high-frequency hematologic parameters, only hemoglobin (+ 9%) increased in the percentage significance over time. Unexpectedly, one marker of kidney function, blood urea nitrogen (BUN), exhibited an increase in predictiveness over time from 66 to 86%, while that of creatinine, another marker of kidney function, was stable from 51 to 49%.

Clinical outcomes of high-frequency laboratory parameters

Table 3 presents clinical outcomes (mortality and disease severity) with respect to percentage significance of high-frequency laboratory parameters. Although clinical outcomes such as ICU admission, mechanical ventilation, and disease progression were also investigated, there was an insufficient amount of such papers to present high-frequency laboratory value correlates with those respective clinical outcomes. As seen in Table 3, there was a wide range in the rate of reported significance in laboratory parameters from both mortality and disease severity. With regards to mortality, percentage significance ranged from 22% (fibrinogen) to 91% (BUN). Quite unexpectedly, BUN and creatinine were the most different in predicting mortality (91% for BUN and 71% for creatinine) versus disease severity (45% for BUN and 31% for creatinine), possibly indicating that impaired renal function significantly contributes to mortality from COVID-19. With regards to disease severity, percentage significance ranged from 22% (hemoglobin) to 96% (neutrophil-to-lymphocyte ratio). Although a majority of high-frequency laboratory values had similar percentage significance rates between both clinical outcomes, 9/25 (36%) of the most-frequently reported laboratory parameters exhibited a greater than 20% difference in percentage significance, including neutrophil-to-lymphocyte ratio (73–96%), creatinine (71–31%), and BUN (91–45%), with regards to mortality and disease severity, respectively.

Table 3 Percentage of statistically significant laboratory values with respect to primary outcomes of mortality and disease severity. Laboratory parameters with at least 20 associated papers with each primary outcome were included

We found that a total of 119 unique laboratory parameters were reported in the literature. Figure 2 presents graphical representation of percentage significance versus frequency for those parameters studied by at least ten publications, a total of 78 laboratory markers. Furthermore, the figure is subdivided into four quadrants, with a paper frequency of 200 serving as the horizontal delimiter and a percentage significance of 50% serving as the vertical delimiter, allowing us to separate laboratory parameters by both low and high frequency and percentage significance categorizations.

Fig. 2
figure 2

Frequency of laboratory values by number of papers. The laboratory values are ordered by increasing frequency on the horizontal axis and by increasing statistical significance on the vertical axis. Names of laboratory values are presented in Appendix 3: supplementary file 2. Note that only laboratory markers found in at least ten papers are included in the figure

Discussion

To our knowledge, this scoping review represents the first attempt to describe the range of laboratory values that are associated with COVID-19 outcomes reported in the literature from January to August 2020 across 28 countries. As methods used to profile laboratory values differed considerably between publications, we were unable to analyze data using a meta-analysis approach without excluding most of the published data. Instead, we analyzed available data using the broader criteria of a scoping review, thus producing an overview of laboratory values profiled during the COVID-19 pandemic. We set out to analyze what values were statistically significant in predicting outcomes of disease, and how each value's predictiveness changed over time and space throughout the course of pandemic.

Trends in published studies and data

As indicated in Table 2, the number of laboratory values which were statistically significant decreased over time, despite the increasing number of patients per study. The fact that statistical significance decreased with increasing power poses some interesting implications. One possible explanation is that some characteristics of the patient population, or the treatments they received, differed between the early and late studies. For example, if a patient with leukocytosis was more likely to die in January 2020, but due to improved treatment, a patient in June 2020 with similar leukocytosis survived, that would weaken the correlation between white blood cell count and mortality.

The geographic shift in published papers parallels the above-mentioned temporal shift in laboratory value significance (Supplemental Table 2). This geographic shift can be due to a variety of factors, including regional differences in patient characteristics, management strategies, or even study design itself.

Significance of laboratory parameters

Table 3 demonstrates that reported laboratory values span a wide range of predictiveness. This is a promising finding, suggesting that there likely exists a subset of laboratory values that, when isolated, has high predictive value for clinical outcomes. High-quality meta-analyses are needed urgently to elucidate such markers. While the question is simple, the immensity of published data and the heterogeneity of study methodology make this a daunting task.

Figure 2 shows that in addition to a range of significance, there is also a widespread in frequency of reporting. This allows us to categorize these reported laboratory values into four groups. The laboratory values in the top-right are both well studied and most significant; we believe these represent the best immediate candidates for meta-analyses: such parameters include lymphocyte count (398 papers, 73% significance), d-dimer level (375 papers, 78% significance), prothrombin time (294 papers, 76% significance), and neutrophil count (286 papers, 73% significance). The bottom-right contains laboratory values that are well studied, but less significant; these could represent values that are less likely to yield predictive value. The laboratory values in the top-left are promising, as they represent values that are infrequently reported, but with high significance. These values include several markers such as complement C4 (43 papers, 79% significance, eGFR (44 papers, 82% significance), and T cell count (31 papers, 84% significance), and we believe they warrant further investigation in future studies. Finally, the bottom-left values, which were infrequently reported and less often significant, likely represent niche values measured in relatively specialized studies.

An unexpected finding is that certain laboratory values differ greatly in predicting mortality versus disease severity. The five laboratory values with the largest discrepancies were BUN (46% difference), fibrinogen (46% difference), CK-MB (41% difference), creatinine (40% difference), and prothrombin time (32% difference). Of note, with the exception of fibrinogen, the other four values above exhibited decreased significance with respect to disease severity, as compared to mortality. An intriguing hypothesis is that organ-system specific markers are more correlated with mortality than disease severity—BUN and creatinine for renal function, prothrombin time for liver synthetic function, and CK-MB for cardiac function (in addition to troponin I, which showed a 24% decrease in significance from mortality to disease severity). While only conjecture at this point, data in Table 3 warrant further investigation. End-organ damage would be expected to have a large impact on survival, while hematologic, immunologic, and inflammatory markers would reflect systemic effects that would cause a more uniform impact on survival and severity.

There are limitations to this study. As it is a scoping review, quantitative statistical analysis was not performed, so we could not make correlative conclusions; we also did not perform detailed quality assessment of studies analyzed. While we did exclude secondary sources to eliminate double-counting of data, it is possible that primary sources from the same hospital systems studied overlapping sets of patients. In addition, we prioritized mortality over disease severity for categorization of primary outcome; during the course of data extraction, it was apparent that several sources included both outcomes as co-primary foci of investigation. Therefore, the disease severity analysis likely undercounts the number of papers and patients. While we did stratify disease severity by specific definitions (e.g., NHCC, WHO, mechanical ventilation, ICU admission), during presentation of the data, all such papers were grouped within one category. As such, the results, in particular Table 3, rely on a heterogeneous disease severity definition.

With regards to future directions, this paper sets the stage for potential quantitative analysis of specific laboratory parameters of interest. In particular, studies with low frequencies in the literature with a high rate of concordance with clinical outcomes may represent a yet untapped avenue of research.

Conclusion

The aim of this review was to provide an overview of the current understanding of the clinical importance of laboratory parameters for patients with COVID-19, especially their prediction of disease severity and mortality. In this study, we were able to (a) provide a comprehensive overview of the existing literature and synthesize the key takeaways, (b) discuss temporal and geographical differences in study significance and outcomes, and (c) identify specific laboratory parameters that may be of higher value for future studies.

As the pandemic progressed, there was a strong increase in the number of studies and the number of patients included in each study. However, despite changes in both patient demographics and clinical management, several laboratory markers continued to be statistically significant in predicting disease severity and patient mortality. In this study, we identified and ranked major inflammatory, hematologic, and organ-specific markers across geography and time. This study can serve as an important guide for future research efforts as we continue to study and better understand the virus that has caused the current pandemic and claimed millions of lives.