Introduction

Liver injury is frequent in patients with COVID-19, with a prevalence of up to 70% [1, 2]. Greater liver functional impairment is associated with more severe forms of the disease. Subjects with chronic liver disease have a higher risk of severe clinical events related to SARS-CoV-2 infection [3].

The Fibrosis-4 Index (FIB-4) is a score derived from routine blood tests, including AST, ALT, platelets (PLT), and age. This score predicts mortality and not-liver-related clinical outcome [4, 5]. Some studies have explored the association between COVID-19 outcomes and FIB-4. While some were prognostic models.[6,7,8,9], others referred to intensive care admission and the need for mechanical ventilation [7, 10, 11]. Machine-learning approaches are a valuable tool for decision-making and capacity allocation and are also increasingly used for COVID-19 patients [12,13,14,15,16].

We evaluated clinical and laboratory data of all patients with COVID-19 admitted to Fondazione Policlinico Universitario Gemelli IRCCS (Rome, Italy) in terms of in-hospital mortality, mechanical ventilation need, length of hospital stays (LOS) within 10 days and admission to the intensive care unit (ICU). We aimed to develop prognostic models for in-hospital mortality during the first four waves, using a machine-learning approach on routinely collected clinical data of patients with available FIB-4 scores.

Methods

This retrospective study was approved by the local ethic committee (ID: 3119). It included all patients with a confirmed diagnosis of COVID-19 at Fondazione Policlinico Universitario A Gemelli IRCCS during four waves of the COVID-19 pandemic: (i) First wave: March–June 2020; (ii) Second wave (October 2020 to February 2021); (iii) Third wave (March–June 2021); (iv) Fourth wave (November 2021 to January 2022). Diagnosis of COVID-19 was defined as the presence of ≥ 1 positive RT-PCR SARS-CoV-2 test from a nasopharyngeal swab at admission.

The research was conducted in accordance with both the Declarations of Helsinki and Istanbul, all research was approved by the ethics review committee of Fondazione Policlinico Gemelli IRCCS, Rome, Italy (ID: 3119).

Outcomes

The primary study outcome was in-hospital mortality. Secondary outcomes were ICU admission, mechanical ventilation use, and the patient was discharged within 10 days from admission.

Data source

Patients’ data were retrieved from electronic healthcare records using the hospital's data science facility Gemelli Generator Real World Data (G2 RWD), a recently developed data analytics and artificial intelligence platform [12, 17]. All data were deidentified before extraction. The G2 RWD repeatable framework leverages several artificial intelligence (AI) techniques to build the disease-specific data model and data set: in this study, the COVID Data Mart, already described elsewhere [12]. COVID-19 Data Mart [12], is built on standard procedures that apply natural language processing algorithms to medical reports. These procedures are based on sentences/words tokenization and a rule-based approach supported by annotations defined by clinical subject matter experts (SMEs) [12].

For each patient, information related to comorbidities, symptoms, vital signs and laboratory exams, demographic and clinical data prior to therapy for SARS-CoV-2 infection were evaluated (Supplementary material).

The FIB-4 index was calculated using the following formula: age (years) × AST (IU/L)/[platelet count (109/L)/√ALT (IU/L)] [18].

Data analysis

Data were analysed by descriptive statistics. Univariate analysis was used to compare potential predictors during each COVID-19 wave for in-hospital mortality, mechanical ventilation needs, ICU admission and discharge within 10 days from admission. Data were compared by the χ2 test, the T-tests, or the Mann–Whitney U test, as appropriate. A univariate logistic regression model was used to evaluate how the components of the FIB-4 score (age, PLT, ALT and AST) and the score itself influenced the probability of death in each of the four waves.

Furthermore, a multivariate logistic regression model focusing specifically on FIB-4 was fitted on patients' data for each wave. For the statistical modeling, the whole dataset was randomly split into two parts: the training set (75% of the total observations) and the test set (25% of the total observations). A stratified sampling strategy was adopted to preserve patient distributions across waves (Supplementary material).

The algorithm led to the following discretization: NLR Score was grouped into three groups (< 3.87, 3.87–7.51 and > 7.51), FIB-4 score was grouped into two groups (< 2.53 and ≥ 2.53), hemoglobin was grouped into two groups (< 12.9 g/dL and ≥ 12.9 g/dL), hematocrit was grouped into two groups (< 38.30% and ≥ 38.30%), calcium was grouped into three groups (< 8.94 mEq/L, 8.94–9.40 mEq/L and > 9.40 mEq/L), urea nitrogen was grouped into three groups (< 17.00, 17.00–26.00, > 26.00 mg/dL) and Charlson score was grouped into four groups (< 3.00, 3.00–4.00, 4.00–6.00, > 6.00).

A logistic regression model was trained based on the training set using forward features selection to remove any features that did not significantly affect survival prediction.

A new logistic regression model that included only the significant variables coming from forward features selection and FIB-4 score was estimated to reduce model dimensionality. Moreover, a 'wave' variable was included in the final model representing differences across waves in clinical characteristics and organizational factors.

The performance of the final logistic regression model was evaluated on the test set through a receiver operating characteristic (ROC) curve analysis and the resulting AUC. The logistic regression model was optimized to maximize the Youden index. The performance of the model after tuning is shown in a confusion matrix and by computing accuracy, sensitivity, specificity, and negative predictive value.

To further investigate the FIB-4 score discriminative power on in-hospital mortality, the Kaplan–Meier method was used; results were compared by log-rank test.p ≤ 0.05 was considered significant unless otherwise stated. All statistical analyses were performed using R software, version 4.0.3 (R Foundation for Statistical Computing, Vienna, Austria).

Results

Cohort characteristics

A total of 4936 patients were evaluated. Demographic data, comorbidities, clinical data at admission, and clinical outcomes are shown in Supplementary Table 1.

Briefly, over the entire timeframe, 1981 (40.2%) patients were discharged within 10 days from admission, and 762 (15.4%) did not survive. The median LOS was 11 days [IQR 6–20 days]. The frequency of ICU admission and mechanical ventilation was 23.9% (n = 1178) and 12.6% (n = 624), respectively.

Daily hospital admissions and deaths across four COVID-19 waves are displayed in Fig. 1, while each wave’s demographic characteristics and outcomes distribution are depicted in Supplementary Table 2. During the second wave, which covered a longer period than the others, in-hospital mortality was higher than in the other waves (448/2336, 19% vs 85/570, 15% in the second wave vs 133/1064, 12% in the third one and vs 96/966, 10% in the fourth wave; p < 0.001). The beginning of the vaccination campaign in 2021 led to a decrease in daily counts of admissions and deaths. Discharge within 10 days shows a positive trend, with an increasing percentage of patients discharged with a shorter length of stay (26%, 39%, 42%, and 48% from the first to the fourth wave, respectively).

Fig. 1
figure 1

A Daily hospital admissions and B daily deaths at Fondazione Policlinico Universitario A. Gemelli IRCCS across COVID waves

FIB-4 score did not differ across waves.

In-hospital mortality: primary outcome

The median age of patients who survived was 64 years [IQR 53–76 years], while the median age of patients who did not survive was 83 years [IQR 78–88 years], p < 0.01 during the first wave (Table 1).

Table 1 In hospital mortality in the four pandemic waves

Data related to ALT were available in 4681, with a median value of 24 IU/L [IQR 15–40 IU/L], and those on AST were available in 1277 patients, with a median of 31 IU/L [IQR 22–49 IU/L]. There were no differences between patients regarding in-hospital mortality status for ALT for all four waves, while AST higher values were associated with mortality for the first and second waves. Higher values of neutrophils, white blood cells (WBC) (p < 0.01), procalcitonin, IL-6, D-dimer, glucose, direct bilirubin, INR and LDH were reported in the group of non-survivors compared with survivors (p < 0.01 for all comparisons).

PLT count was recorded in 4801 patients, with a median value of 211 103/mm3 [IQR 162–275 103/mm3] and was higher in survivors compared with non-survivors only during the second and the third waves.

Plasm albumin levels were recorded in 3325 patients, with a median value of 32 g/L [IQR 28–35 g/L]. A lower level of plasm albumin was correlated with poorer survival.

The median of the FIB-4 score was 1.94 [IQR 1.16–3.36]. FIB-4 was available only for 1263/4936 (25.6%) patients (Fig. 2).

Fig. 2
figure 2

Flowchart of the study

Supplementary Table 1 shows comparisons between patients for whom FIB-4 could be calculated and patients without a FIB-4 measurement. The median age of patients with FIB-4 was 65 years [IQR 51–77], while the median age of patients without a FIB-4 measurement was 67 years [IQR 54–79 years], p < 0.01. Patients with FIB-4 had a higher incidence of gastrointestinal disease (p = 0.02), chronic liver disease (p = 0.04), cirrhosis (p < 0.01) and immunodeficiency (p 0.01). In terms of symptoms at admission, FIB-4 patients more often had fever, anosmia/dysgeusia, cough (p < 0.01) and dyspnea (p = 0.05). Platelets count and triglycerides were higher in patients without FIB-4 (p < 0.01 and p = 0.01) while alkaline phosphatase was lower (p < 0.01).

Patients who died from COVID-19 had higher FIB-4 compared with those who were alive upon discharge in every single wave. FIB-4 score < 2.53 leads to a significant increment (p < 0.0001) in survival probability compared to a FIB-4 score ≥ 2.53 (Fig. 3).

Fig. 3
figure 3

Kaplan–Meier survival probability curves with FIB-4 as covariate

Differences in COVID-19 waves mainly affect patients with high FIB-4 scores (≥ 2.53) (Fig. 4). Considering the group of patients with a high FIB-4 score, survival curves separated significantly across waves (p < 0.0001). A clear separation exists between the first/second waves, in which the probability of survival decreases rapidly over time, and the third/fourth waves, in which the probability of survival remains more constant over time.

Fig. 4
figure 4

Kaplan–Meier survival probability curves across COVID waves for patients A with Fib-4 score < 2.53, B with Fib-4 score ≥ 2.53. p-values of the Log-rank test

In contrast, the separation between waves is not significant among patients with low FIB-4 scores (p = 0.051).

Validation of a multivariable model for the primary outcome

The univariate logistic regression models fitted on the components of the FIB-4 score and the score itself are shown in Fig. 5. The relationship between the FIB-4 score and mortality risk is monotonically increasing with a steeper curve in the first and second waves.After preprocessing steps, 1143 patients and 35 variables were included in the final dataset. The whole sample was randomized in training (75% of the total number of observations) and in a testing sample (25% of the total number of observations) through a stratified sampling to maintain patient distributions across waves (16% during the first wave, 29% during the second wave, 15% during the third wave, 40% during the fourth wave).

Fig. 5
figure 5

Univariate logistic regression curves. AD show the probability of death versus Fib4 components. E shows the probability of death versus FIB-4. As shown in B and E, mortality risk curves are steeper during the first and second waves, the same AST and FIB-4 values are associated with a higher probability of death than in the third and fourth waves. The same behaviour does not emerge with the other FIB-4 components

The number of patients and the percentage of events in the training and test set were 856 (11.4%) and 287 (10.4%), respectively.

The binomial logistic regression shows that the mortality risk increases for FIB-4 score values ≥ 2.53 (OR = 4.53, 95% CI 2.83–7.25; p ≤ 0.001). Patients during the third wave (OR = 0.34, 95% CI 0.15–0.75; p = 0.007) and the fourth wave (OR = 0.40, 95% CI 0.24–0.66; p ≤ 0.001) had a decreased risk of mortality compared with other patients. The model also showed that mortality risk increases as LDH increases (OR = 1.001, 95% CI 1.000–1.002; p = 0.021). ROC curve analysis showed an AUC of 0.752 on the training set and 0.753 on the test set (Supplementary Fig. 1). The confusion matrix shows an accuracy of 0.76 (95% CI 0.70–0.81) with a sensitivity and specificity of 0.64 and 0.77 on the test set, respectively. Negative and positive predictive values are 0.94 and 0.25, respectively.

Secondary outcomes (mechanical ventilation, ICU admission, LOS)

Patients requiring mechanical ventilation were older compared with those who did not (72 years [IQR 63–78 years] vs 68 years [IQR 53–80 years], p < 0.05) (Supplementary Tables 3 and 4).

Lower oxygen saturation was also associated with the need for mechanical ventilation (Supplementary Table 3), ICU admission (Supplementary Table 4), mortality (Table 1) and a lower probability of being discharged within 10 days (Supplementary Table 5). Furthermore, lower PLT count was correlated with admission to the ICU only during the second wave. There were no significant differences between the group who required mechanical ventilation and those who did not for the first and fourth waves. Patients admitted to ICU had similar FIB-4 scores compared with those who were not admitted to ICU (median 2.03 [IQR 1.45–3.90] vs 1.76 [IQR 0.98–3.19], p = 0.21). Patients who required mechanical ventilation had higher FIB-4 scores as compared to those who did not (3.15 [IQR 1.56–5.87] vs 1.81 [IQR 1.01–3.26]; p = 0.02). Lower FIB-4 were more likely to be discharged within 10 days when compared with patients with high FIB-4 (median 1.30 [IQR 0.74–2.36] vs 2.15 [IQR 1.41–4.00], p ≤ 0.01).

Discussion

Our study shows that FIB-4, a simple score based on clinical data derived from routine laboratory analyses upon admission, is correlated with mortality and morbidity in patients with COVID-19. Although this parameter was available only for a proportion of patients, we also show the association of FIB-4 with COVID outcomes across four waves of infection.

Prognostic models for patients with COVID-19 [19,20,21], include factors, such as vital signs, age, comorbidities, and radiological features. Our study tested NLR, LDH, BUN, sodium, calcium, age, hemoglobin, and FIB-4 as independent risk factors for poorer outcomes.

FIB-4 has been validated for predicting the risk of fibrosis in liver disease and is recommended as a first-line, non-invasive test to rule out fibrosis [18]. Liver function tests alteration is frequently reported during SARS-CoV-2 infection and is probably due to direct viral damage to hepatocytes, cytokine release, ischemic liver damage or drug-induced liver injury [22, 23].

In univariate analysis, age was associated with a greater need for mechanical ventilation, LOS and reduced survival. Low PLT was also associated with reduced survival. Regarding transaminases, we found that higher values were related to a greater risk of mechanical ventilation for both ALT and AST. While ALT was related to ICU admission, AST was associated with prolonged LOS. Although we found no difference in mortality associated with transaminases, we disclosed a difference in FIB-4, resulting from the combination of age, ALT, AST and PLT count.

Our results are consistent with other studies that assessed the association of FIB-4 with mortality [2, 24], ICU admission and mechanical ventilation in COVID-19 patients [6, 10, 11].

The meaning of the FIB-4 score in COVID cohorts is still under debate. FIB-4 now appears to have relevance as a biomarker beyond the correlation with liver fibrosis or damage, especially in the case of SARS-CoV-2 infection. Some authors speculated that an elevated FIB-4 could reveal not only underlying liver disease but can reflect a “systemic” or multiorgan involvement of COVID-19 [7]. We agree with this speculation, but further studies are needed to understand the possible mechanisms supporting this hypothesis.

Our AI-based approach allowed us to test the variability over the four COVID-19 waves. Interestingly, among the four variables of FIB-4 score, age and AST (particularly in the first two waves) showed the most prognostic impact on mortality from COVID-19 infection, while ALT and PLT count have a minor but still significant role. Despite the prognostic role of age is well-established, AST role needs further discussion. In SARS-COV-2-related disease, AST elevation could either reflect direct hepatocellular damage or systemic inflammatory syndrome involving the liver and muscles. The hepatotropism of SARS-CoV-2 is further exacerbated in chronic liver diseases due to a higher expression of ACE2 receptors as a response to liver fibrosis [25, 26]. An elevation of AST in subjects with SARS-CoV-2 infection without chronic liver diseases could be related to direct cytotoxic damage and systemic and local pro-inflammatory responses. Interestingly, AST and ALT elevations are likely to persist after infection recovery, indicating chronic liver damage that could eventually lead to fibrosis and chronic liver disease [22, 27]. Alterations in liver enzymes are frequent in patients hospitalized for COVID-19, but the trajectory of alterations recorded during hospitalization is not always defined [28], and there is often a lack of data about pre-existing liver disease. The possible influence of such alterations on COVID-related mortality is a matter of debate. The EASL (European Association for the Study of the Liver) recently recommended the need for liver enzyme monitoring in patients hospitalized for hospitalized patients SARS-COV2 infection [29]. Further studies are needed to better understand the short- and long-term prognostic role of AST and ALT elevation in COVID-19-related disease.

Using a machine learning approach, we outlined a cut-off of 2.53 for FIB-4, beyond which the risk of death increases significantly. There have also been other studies that have considered FIB-4 in their models but with different cut-offs. Park et al. [7] found a cut-off of 4.95 to be a good predictor of mortality. Lombardi et al. [30] recently confirmed the prognostic role of FIB-4 in 382 patients. They showed that a FIB4 < 1.45 is a protective factor against severe SARS-CoV-2 infection and that in patients with at least one metabolic comorbidity, FIB-4 > 1.45 is associated with poorer outcomes. Bucci et al. [31] showed in a prospective cohort of patients that a FIB-4 cut-off of 2.76 has the best prognostic performance for survival in severe COVID-19. The strength of our cut-off is that it has been internally validated by examining the model’s performance in the training and test set, while the confusion matrix showed a well balanced accuracy.

This study has some limitations. First of all, it is a retrospective study design. Second, the AST assessment is not done routinely in the Emergency Department of our hospital, thus implying the reduction of the availability of data on FIB-4. The hospital’s policy of not providing routine AST determination is based on several factors. These include the limited diagnostic value of AST compared to other liver function tests such as ALT and ALP, which are more specific for liver damage in the emergency room setting. Additionally, the high variability of AST and its relatively higher cost compared to other liver function tests have also been considered. The decision is also in line with the recommendations from scientific societies that prioritize ALT over AST measurement in emergency room settings to assess liver function, as ALT is more specific to liver injury [32, 33]. Third, we were unaware of any previous pharmacological treatments that could potentially contribute to the elevation of liver enzymes before hospital admission. However, this limitation is, in our opinion, overcome by a large number of cases from a single center, which reduces the potential variability associated with drug use prior to hospitalization. Additionally, we did not have access to laboratory and anamnestic data related to underlying liver disease prior to SARS-CoV-2 infection. The strength of our study is the use of AI in collecting demographic and clinical data from each patient’s clinical diary. Another important point to note is that our study is monocentric, which eliminates the bias between different laboratory samplings.

The identification of a rapid, non-invasive, and cost-effective predictor of severe disease that could help in the early identification of patients who require more intensive monitoring would be of major clinical value. Indeed, this tool might facilitate the identification of cases at a higher risk of COVID-19-related clinical outcomes. Given the integration and automated data processing feature, the AI integration with electronic health records can be used for decision-support with a machine-based triage process that can help wards and Emergency Departments during periods of high peak and workloads. In addition, the availability of a patient-centered data set that is constantly updated with new patients and new clinical and laboratory data allows continuous learning and validation, with the potential to identify structural modification in the disease patterns, including the influence of variants and vaccines.

Conclusion

In a large monocentric cohort of COVID-19 patients, we showed that FIB-4 assessed at hospital admission could provide prognostic information and help clinicians identify patients with COVID-19 disease at risk of in-hospital mortality, mechanical ventilation, and ICU admission.

FIB-4 could be an easy and inexpensive tool for stratification of the risk for COVID-19 subjects. The evaluation of FIB-4 routinely at patient admission could facilitate risk stratification and optimization of healthcare resources.

Given the specific setup, with predictors being developed and validated on a continuously updated patients cohort, we also show a paradigm for the integration in clinical practice of pragmatic, replicable decision support, enabling rapid assessment of disease severity at baseline and data-driven comparison over time of evolving disease patterns.