FormalPara Highlights
  • This paper introduces a comprehensive data-driven approach for the COVID-19 pandemic with goals to inform the overall scientific community, to estimate the epidemiological spread of the virus, to provide clinical insights, and to support ventilator allocation decisions for policy makers.

  • To consolidate medical insights, a clinical database on the disease is aggregated from available scientific literature.

  • To assess the risk of infection and mortality, we provide personalized risk prediction models from the electronic health records by leveraging machine learning algorithms.

  • To forecast the progression of COVID-19 and evaluate the impact of various social distancing policies, we develop a dynamic epidemiological model.

  • To inform operational decisions for government officials, our optimization model addresses surges in ventilator demand through state-level and federal reallocation.

1 Introduction

In just a few weeks, the whole world was upended by the outbreak of COVID-19, an acute respiratory disease caused by a new coronavirus called SARS-CoV-2. The virus is highly contagious: it is easily transmitted from person to person via respiratory droplet nuclei and can persist on surfaces for days [22, 43]. As a result, COVID-19 has spread rapidly—classified by the World Health Organization as a public health emergency on January 30, 2020 and as a pandemic on March 11. As of November 2020, over 51 million cases and 1.2 million deaths have been reported globally [20].

Given the uncertainty surrounding the disease and its treatment, healthcare providers and policy makers have wrestled with unprecedented challenges. Hospitals and other care facilities have faced shortages of beds, ventilators and personal protective equipment—raising hard questions on how to treat COVID-19 patients with scarce supplies and how to allocate resources to prevent further shortages. At the policy level, most countries have imposed “social distancing” measures and other non-pharmaceutical interventions to slow the spread of the pandemic. These measures allow strained healthcare systems to cope with the disease by “flattening the curve” [2] but also come at a steep economic price [11, 32]. This trade-off has prompted difficult decisions balancing public health and socio-economic outcomes.

This paper proposes a comprehensive data-driven approach to combat the COVID-19 pandemic. We leverage a broad range of data sources, which include (i) our own cohort-level data aggregating hundreds of clinical studies, (ii) patient-level data obtained from electronic health records, and (iii) census reports on the scale of the pandemic. We develop an integrated approach spanning descriptive analytics (to derive a macroscopic understanding of the disease), predictive analytics (to forecast the near-term impact and longer-term dynamics of the pandemic), and prescriptive analytics (to support healthcare and policy decision-making).

Our approach comprises four steps (Fig. 1):

  • Aggregating and visualizing the most comprehensive clinical database on COVID-19 as of May 2020 (Section 1). We aggregate cohort-level data on demographics, comorbidities, symptoms and lab values from 160 clinical studies. These data paint a broad picture of the disease: identifying common symptoms, disparities between mild and severe patients, and geographic disparities—insights that are hard to derive from any single study and can orient future clinical research on COVID-19, its mutations, and its disparate effects across ethnic groups.

  • Providing personalized indicators to assess the risk of mortality and infection (Section 2). Using patient- level data, we develop machine learning models to predict mortality and infection risk, as a function of demographics, symptoms, comorbidities, and lab values. Using gradient boosting methods, the models achieve strong predictive performance—with an out-of-sample area under the curve above 90%. These models yield personalized calculators that can (i) guide triage, treatment, and care management decisions for strained healthcare systems, and (ii) serve as pre-screening tools for patients before they visit healthcare or testing facilities.

  • Developing a novel epidemiological model to forecast the evolution of the disease and assess the effects of social distancing (Section 3). We propose a new compartmental model called DELPHI, which accounts for COVID-19 features such as underdetection and government response. The model estimates the disease’s spread with high accuracy; notably, its projections from as early as April 3 have matched the number of cases observed in the United States up to mid-May and outperforms comparable methods during such period. We also provide a data-driven assessment of social distancing policies, showing that the pandemic’s spread is highly sensitive to the stringency and timing of mitigating measures.

  • Proposing an optimization model to support ventilator allocation in response to the pandemic (Section 4). We formulate a mixed-integer optimization model to allocate ventilators efficiently in a semi-collaborative setting where resources can be shared both between healthcare facilities or through a central authority. In the United States, this allows us to study the trade-offs of managing the federal ventilator stockpile in conjunction with inter-state transfers. Results show that limited ventilator transfers could have eliminated shortages in April 2020.

Fig. 1
figure 1

Overview of our end-to-end analytics approach. We leverage diverse data sources to inform a family of descriptive, predictive and prescriptive tools for clinical and policy decision-making support

This work makes two key contributions. First, we derive data-driven insights about the early stages of the COVID-19 pandemic. Although some of the results should be treated with caution when extrapolated beyond the period spanning March to May 2020, these insights help understand the clinical characteristics of the disease, predict its mortality, forecast its evolution, and ultimately alleviate its impact. Second, we provide a comprehensive roadmap to guide short-term responses to new, and unforeseen epidemics. The proposed approach involves four steps: (i) gathering meta-data from early small-scale clinical studies to derive a fast and broad understanding of the disease, (ii) applying predictive analytics based on patient-level data to identify the drivers of the disease and its mortality, (iii) using population-level data on cases, hospitalizations and deaths to predict the macroscopic evolution of the disease, and (iv) leveraging these models for resource allocation optimization to alleviate the near-term damage of the disease. A major feature of this approach is to treat these different questions as interdependent challenges, as opposed to a series of isolated problems. Indeed, clinical decision-making depends directly on patient inflows and available supplies, while resource planning and government responses react to patient-level outcomes. By combining various data sources into descriptive, predictive and prescriptive methods, this paper proposes an end-to-end approach to design a comprehensive and cohesive response to the COVID-19 pandemic and future epidemics.

Ultimately, this paper develops analytical tools to inform clinical and policy responses to the COVID-19 pandemic. These tools are available to the public on a dedicated website. Footnote 1 They have also been deployed in practice to combat the spread of COVID-19 globally. Several hospitals in Europe have used our risk calculators to support pre-triage and post-triage decisions, and a major financial institution in South America is applying our infection risk calculator to determine how employees can safely return to work. A major hospital system in the United States, Hartford Healthcare, planned its intensive care unit (ICU) capacity based on our forecasts, and leveraged our optimization results to allocate ventilators across hospitals when the number of cases was rising. Our epidemiological predictions are used by one of the largest pharmaceutical companies, Janssen Pharmaceuticals, to design a vaccine trial location selection strategy . They have also been one of the top 5 models that are consistently incorporated into the US Center for Disease Control’s forecasts [42] and its core ensemble model.

2 Descriptive analytics: clinical outcomes database

Early responses to the COVID-19 pandemic have been inhibited by the lack of available data on patient outcomes. Individual centers released reports summarizing patient characteristics. Yet, this decentralized effort makes it difficult to construct a cohesive picture of the pandemic.

To address this problem, we construct a database that aggregates demographics, comorbidities, symptoms, laboratory blood test results (“lab values”, henceforth) and clinical outcomes from 160 clinical studies released between December 2019 and May 2020—made available on our website for broader use. The database contains information on 133,600 COVID-19 patients (3.13% of the global COVID-19 patients as of May 12, 2020), spanning mainly Europe (81,207 patients), Asia (19,418 patients) and North America (23,279 patients). To our knowledge, this is the largest dataset on COVID-19 that contains detailed clinical outcomes (as of May 2020).

2.1 Data aggregation

Each study was read by a researcher, who transcribed numerical data from the manuscript, and subsequently checked by a second researcher, to verify correctness. The Appendix reports the main transcription assumptions.

Each row in the database corresponds to a cohort of patients—some papers study a single cohort, whereas others study several cohorts or sub-cohorts. Each column reports cohort-level statistics on demographics (e.g., average age, gender breakdown), comorbidities (e.g., prevalence of diabetes, hypertension), symptoms (e.g., prevalence of fever, cough), treatments (e.g., prevalence of antibiotics, intubation), lab values (e.g., average lymphocyte count), and clinical outcomes (e.g., average hospital length of stay, mortality rate). We also track whether the cohort comprises “mild” or “severe” patients (mild and severe cohorts are only a subset of the data).

We should note that, as the pandemic has progressed, clinical knowledge of COVID-19 has improved beyond what was knowable when this study was performed. As a result, the prevalence of the symptoms and the mortality rates reported here may not accurately reflect the current status of the pandemic. For instance, Anosmia (loss of sense of taste/smell) is now recognized as one of the main symptoms of COVID-19 but is not mentioned in this section. Similarly, mortality rates have dropped significantly since March/April due, in particular, to advances by the medical community.

Due to the pandemic’s urgency, many papers were published before all patients in a cohort were discharged or deceased. Accordingly, we estimate the mortality rate from discharged and deceased patients only (referred to as “Projected Mortality”).

2.2 Objectives

Our main goal is to leverage this database to derive a macroscopic understanding of the disease. We break it down into the following questions:

  • Which symptoms are most prevalent?

  • How do “mild” and “severe” patients differ in terms of symptoms, comorbidities, and lab values?

  • Can we identify epidemiological differences in different parts of the world?

2.3 Descriptive statistics

Table 1 depicts the prevalence of COVID-19 symptoms, in aggregate, classified into “mild” or “severe” patients, and classified per geographic region. Our key observations are that:

  • Cough, fever, shortness of breath, and fatigue are the most prevalent symptoms of COVID-19.

  • COVID-19 symptoms are much more diverse than those listed by public health agencies. COVID-19 patients can experience at least 15 different symptoms. In contrast, the US Center for Disease Control and Prevention lists seven symptoms (cough, shortness of breath, fever, chills, myalgia, sore throat, and loss of taste/smell) [47]; the World Health Organization lists three symptoms (fever, cough, and fatigue) [52]; and the UK National Health Service lists two main symptoms (fever and cough) [35]. This suggests a lack of consensus among the medical community, and opportunities to revisit public health guidelines to capture the breadth of observed symptoms.

  • Shortness of breath and elevated respiratory rates are much more prevalent in cases diagnosed as severe.

  • Symptoms are quite different in Asia vs. Europe or North America. In particular, more than 75% of Asian patients experience fever, as compared to less than half in Europe and North America. Conversely, shortness of breath is much more prevalent in Europe and North America.

Table 1 Count and prevalence of symptoms among COVID-19 patients, in aggregate, broken down into mild/severe patients, and broken down per continent (Asia, Europe, North America)

Using a similar nomenclature, Table 2 reports comorbidities, demographics, average lab values and average clinical outcomes among all patients, mild patients and severe patients. In terms of demographics, severe populations of patients have a higher incidence of male subjects and are older on average. Severe patients also have elevated comorbidity rates. Figure 2 visually confirms the impact of age and hypertension rates on population-level mortality—consistent with [15, 16, 40]. In terms of lab values, CRP, AST, BUN, IL-6 and Protocalcitonin are highly elevated among severe patients.

Table 2 Comorbidities, demographics, average lab values, average length of stay and projected mortality among COVID-19 patients, in aggregate and broken down into mild/severe patients
Fig. 2
figure 2

Impact of cohort characteristics on projected mortality, assessed at a cohort level. The size of each dot represents the number of patients in the cohort, and its color represents the nation the study was performed in. We only include studies reporting both discharged and deceased patients

2.4 Discussion and impact

Our database is the largest available source of clinical information on COVID-19 assembled to date (as of May 2020). As such, it provides new insights on common symptoms and the drivers of the disease’s severity. Ultimately, this database can support guidelines from health organizations, and contribute to ongoing clinical research on the disease.

Another benefit of this database is its geographical reach. Results highlight disparities in patients’ symptoms across regions. These disparities may stem from (i) different reporting criteria; (ii) different treatments; (iii) disparate impacts across different ethnic groups; and (iv) mutations of the virus since it first appeared in China. This information contributes to early evidence on COVID-19 mutations [12, 17] and on its disparate effects on different ethnic groups [13, 48].

Finally, the database provides average values of key parameters into our epidemiological model of the disease’s spread and our optimization model of resource allocation (e.g., average length of stay of hospitalizations, average fraction of hospitalized patients put on a ventilator).

The insights derived from this descriptive analysis highlight the need for personalized data-driven clinical indicators. Yet, our population-level database cannot be leveraged directly to support decision-making at the patient level. We have therefore initiated a multi-institution collaboration to collect electronic medical records from COVID-19 patients and develop clinical risk calculators. These calculators, presented in the next section, are informed by several of our descriptive insights. Notably, the disparities between severe patients and the rest of the patient population inform the choice of the features included in our mortality risk calculator. Moreover, the geographic disparities suggest that data from Asia may be less predictive when building infection or mortality risk calculators designed for patients in Europe or North America—motivating our use of data from Europe.

3 Predictive analytics: mortality and infection risk

Throughout the COVID-19 crisis, physicians have made difficult triage and care management decisions on a daily basis. These decisions initially relied on small-scale clinical tests; each clinical test requires significant time, personnel and equipment and thus cannot be easily replicated. As the burden on “hot spots” ebbed in late spring, hospitals began to aggregate rich data on COVID-19 patients. This data offers opportunities to develop algorithmic risk calculators for large-scale decision support, facilitating a more proactive and data-driven strategy to combat the disease globally.

We have established a patient-level database of thousands of COVID-19 hospital admissions. Using state-of-the-art machine learning methods, we develop a mortality risk calculator and an infection risk calculator. The resultant models enable the rapid identification of risk factors and clinical insights which were lacking at the pandemic’s onset. Machine learning is particularly useful in such a setting, where we seek to analyze diverse datasets in a timely and scalable fashion. These personalized risk assessment tools can support critical care management decisions, spanning hospital triage to testing prioritization.

3.1 Methods

This investigation constitutes a multi-center study from healthcare institutions in Spain and Italy, two countries severely impacted by COVID-19. Specifically, we collected data from (i) Azienda Socio-Sanitaria Territoriale di Cremona (ASST Cremona), the main hospital network in the Province of Cremona, and (ii) HM Hospitals, a leading hospital group in Spain with 15 general hospitals and 21 clinical centers spanning the regions of Madrid, Galicia, and León. We applied the following inclusion criteria to the calculators:

  • Mortality Risk: We include adult patients diagnosed with COVID-19 and hospitalized. We consider patients who were either discharged from the hospital or deceased within the visit—excluding active patients. We include only lab values and vital values collected on the first day in the emergency department to match the clinical decision setting—predicting prognosis at the time of admission.

  • Infection Risk: We include adult patients who underwent a polymerase chain reaction test for detecting COVID-19 infection at the ASST Cremona hospital [26].Footnote 2 We include all patients, regardless of their clinical outcome. Each patient was subject to a blood test. We omit comorbidities since they are derived from the discharge diagnoses, hence not available for all patients.

We train two models for each calculator: one with lab values and one without lab values. Missing values are imputed using k-nearest neighbors imputation [46]. We exclude features missing for more than 40% of patients. We train binary classification models for both risk calculators, using the XGBoost algorithm [8]. We use SHapley Additive exPlanations (SHAP) [30, 31] to generate importance plots that identify risk drivers and provide transparency on the model predictions. All statistical analyses have been conducted using Python 3.7 [38].

To evaluate predictive performance, we use 40 random data partitions into training and test sets. We compute the average Area Under the Curve (AUC), sensitivity, specificity, precision, negative predictive value, and positive predictive value. We calculate 95% confidence intervals using bootstrapping.

3.2 Results

3.2.1 Study population

The mortality study population comprises 2,831 patients, 711 (25.1%) of whom died during hospitalization while the remaining ones were discharged. Table 3 summarizes the clinical characteristics of the cohort, both in aggregate and broken down by survival status. The reported features are those used in the final model, that is age, gender, 3 vitals values, 13 lab results, and 4 comorbidities.

Table 3 Characteristics of study population for mortality prediction model

Our infection cohort comprises 3,135 patients, 1,661 (53.0%) of whom tested positive for COVID-19. This cohort only includes patients from ASST Cremona, as negative tests results were not available from HM Hospitals. Table 4 summarizes the clinical characteristics of the cohort, both in aggregate and broken down by COVID-19 test result. Again, the reported features are those used in the final model, that is age, gender, 4 vitals values, and 14 lab values.

Table 4 Characteristics of study population for infection test prediction model

3.2.2 Performance evaluation

All models achieve strong out-of-sample performance. Our mortality risk calculator has an AUC of 93.8% with lab values and 90.5% without lab values. Our infection risk calculator has an AUC of 91.8% with lab values and 83.1% without lab values. These values suggest a strong discriminative ability of the proposed models. We report average results across all random data partitions in the Appendix.

We also report threshold-based metrics in the Appendix, which evaluate the discriminative ability of the calculators at a fixed cutoff. With the threshold set to ensure a sensitivity of at least 90% (motivated by the high costs of false negatives), we obtain accuracies spanning 65%–80%.

The mortality model achieves better overall predictive performance than the infection model. As expected, both models have better predictive performance with lab values than without lab values. Yet, the models without lab values still achieve strong predictive performance.

3.2.3 Model interpretation

Figure 3 plots the SHAP importance plots for all models. The figures sort the features by decreasing significance. For each one, the row represents its impact on the SHAP value, as the feature ranges from low (blue) to high (red). Higher SHAP values correspond to increased likelihood of a positive outcome (i.e. mortality or infection). Features with the color scale oriented blue to red (resp. red to blue) from left to right have increasing (resp. decreasing) risk as the feature increases. For example, “Age” is the most important feature of the mortality score with lab values (Fig. 3a), and older patients have higher predicted mortality.

Fig. 3
figure 3

SHapley Additive exPlanations (SHAP) importance plots for the mortality and infection risk calculators. The five most important features are shown for each model. Gender is a binary feature (female is equal to 1, shown in red; male is equal to 0, shown in blue). Each row represents the impact of a feature on the outcome, with higher SHAP values indicating higher likelihood of a positive outcome

3.3 Discussion and impact

The proposed models provide algorithmic screening tools that deliver COVID-19 risk predictions using common clinical features. In a constrained healthcare system or in a clinic without access to advanced diagnostics, clinicians can use these models to rapidly identify high-risk patients to support triage and treatment decisions. The models without lab values offer an even simpler tool that could be used outside of a clinical setting. In strained healthcare systems, it can be difficult for patients to obtain direct advice from providers, and so this tool could serve as a pre-screening step. While the exclusion of lab values reduces the AUC (especially for infection), these calculators still perform strongly.

Our models provide insights into risk factors and biomarkers related to COVID-19 infection and mortality. Our results suggest that the main indicators of mortality risk are age, BUN, CRP, AST, and low oxygen saturation. These findings validate several population-level insights from Section 2 and are in agreement with clinical studies and public health guidance. Age is widely recognized as a primary risk factor [55]. Studies have found shortness of breath as a common symptom of severe cases [50, 53], and low oxygen saturation has been further identified as a risk factor even without respiratory symptoms [51]. BUN, CRP, and AST have been identified as key biomarkers in severe COVID-19 cases: elevated BUN as an indicator of kidney dysfunction [9], elevated levels of CRP as an inflammatory marker [7, 49, 54], and elevated AST levels related to liver dysfunction [16, 18].

Turning to infection risk, the main indicators are CRP, WBC, Calcium, AST, and temperature. These findings are also in agreement with clinical reports: an elevated CRP generally indicates an early sign of infection and implies lung lesions from COVID-19 [28], elevated levels of leukocytes suggest cytokine release syndrome caused by SARS-CoV-2 virus [45], and lowered levels of serum calcium signal higher rate of organ injury and septic shock [44]. The agreement between our findings and clinical observations offers credibility for the use of our calculators to support clinical decision-making—although they are not intended to substitute clinical diagnostic or medical expertise.

When lab values are not available, the widely accepted risk factors of age, oxygen saturation, temperature, and heart rate become the key indicators for both risk calculators. We observe that mortality risk is higher for male patients (blue in Fig. 3b) than for female patients (red), confirming clinical reports [21, 29]. An elevated respiratory frequency becomes an important predictor of infection, as reported in [55]. These findings suggest that demographics and vitals provide valuable information in the absence of lab values. However, when lab values are available, these other features become secondary.

The timing of the clinical data, obtained from March through May 2020, is relevant in considering the generalizability of the models. Various factors have affected COVID-19 severity as the pandemic has continued to progress. Treatment protocols, government policies, seasonal factors, and the evolution of the virus itself all potentially contribute to changes in risk determinants. In theory, the risk factors identified in this paper may not generalize to broader populations (e.g., non-European populations, populations in the Fall of 2020). However, our findings have been validated by more recent studies, such as the 4C Mortality Score, which also highlighted the importance of age, oxygen saturation, and CRP in the classification of severe COVID-19 infections [24]. As a result, our proposed infection and mortality models are relevant beyond the geographical limits of Italy and Spain and beyond the critical months of March and April 2020. More broadly, these recent results underscore the importance of prospective validation of risk calculators as we enter subsequent phases of the pandemic to assess their ongoing applicability.

A limitation of the current mortality model is that it does not take into account medication and treatments during hospitalization. This is mainly due to the fact that, during the first months of the COVID-19 pandemic, no systematic treatment protocol had yet been established, and it was therefore challenging to account for treatment effects in our model. Accordingly, our objective in this paper is to uncover the associations between patient characteristics and risk scores, without making claims related to causality. We have addressed this limitation in later research, by leveraging data on treatments to delineate the impact of medications at the patient level 〈REF〉.Footnote 3

Overall, we have developed data-driven calculators that allow physicians and patients to assess mortality and infection risks in order to guide care management—especially with scarce healthcare resources. These tools were developed in the early stages of the pandemic, to support an initial pandemic response. As the pandemic has progressed, testing has become more widespread and affordable. Yet, reagent resources for Reverse Transcription Polymerase Chain Reaction (RT-PCR) testing remain relatively scarce, and the time frame to get RT-PCR results ranges between 24 to 48 hours. Therefore, molecular tests cannot be implemented at large scale in resource constrained systems, let alone applied on a daily basis. In such instances, our infection score can aid fast and effective decision-making in clinical settings. In addition, testing capacities remain even more limited in non-clinical settings. Notably, our infection calculator has been employed by the Banco de Credito del Peru, the largest bank in Peru, to guide safety protocols for employees and work from home policies. Finally, our mortality calculator remains useful as one of the first models to synthesize clinical features into a single risk score upon hospital admission. This calculator is being used by several hospitals within the ASST Cremona system to support triage and treatment decisions, ultimately alleviating the toll of the pandemic.

4 Predictive and prescriptive analytics: disease projections and government response

We develop a new epidemiological model, called DELPHI (Differential Equations Leads to Predictions of Hospitalizations and Infections). The model first provides a predictive tool to forecast the number of detected cases, hospitalizations and deaths—we refer to this model as “DELPHI-pred”. It then provides a prescriptive tool to simulate the effect of policy interventions and guide government response to the COVID-19 pandemic—we refer to this model as “DELPHI-presc”. All models are fit in each US state (plus the District of Columbia).

4.1 DELPHI-pred: projecting early spread of COVID-19

4.1.1 Model development

DELPHI is a compartmental model, with dynamics governed by ordinary differential equations. It extends the standard SEIR model by defining 11 core states (Fig. 4): susceptible (S), exposed (E), infectious (I), undetected people who will recover (UR) or decease (UD), detected hospitalized people who will recover (DHR) or decease (DHD), quarantined people who will recover (DQR) or decease (DQD), recovered (R) and deceased (D). The separation of the UR/UD, DQR/DQD and DHR/DHD states enables separate independence in fitting recoveries and deaths from the data. Further auxiliary states, including Total Detected Cases (DT) and Total Detected Deaths (DD), are defined relative to the core states in order to correspond to actual data.

Fig. 4
figure 4

Simplified flow diagram of DELPHI

As opposed to many other COVID-19 models [?[, see, e.g.,]]kissler2020projecting, DELPHI captured two key elements of the pandemic explicitly since its inception in late March:

  • Underdetection: Many cases remain undetected due to limited testing, record failures, and detection errors. Ignoring them would underestimate the scale of the pandemic. We capture them through the UR and UD states.

  • Government Response: “Social distancing” policies limit the spread of the virus. Ignoring them would overestimate the spread of the pandemic. We model them through a decline in the infection rate over time. Specifically, we write: \(\frac {\mathrm {d} S}{\mathrm {d} t} = -\alpha \gamma (t)S(t)I(t),\) where α is a constant baseline rate and γ(t) is a time-dependent function characterizing each state’s policies, modeled as follows:

    $$\gamma(t)=\frac{2}{\pi}\arctan\left( \frac{-(t-t_{0})}{k}\right)+1.$$

    The inverse tangent function provides a concave-convex relationship, capturing three phases of government response. In Phase I, most activities continue normally as people adjust their behavior. In Phase II, the infection rate declines sharply as policies are implemented. In Phase III, the decline in the infection rate reaches saturation. The parameters t0 and k can be respectively thought of as the median date and the strength of the response.

Ultimately, DELPHI involves 13 parameters that define the transition rates between the 11 states (details in the Supplementary Materials). We calibrate six of the biological parameters from our clinical outcomes database (Section 2). The remaining 7 parameters are optimized using constrained Nelder-Mead optimization [25] and trust-region methods [6] by minimizing a weighted mean squared error on detected cases and deaths:

$$ \begin{array}{@{}rcl@{}} \textbf{Weighted MSE} &=& \sum\limits_{t=1}^{T} t \cdot \left( \widetilde{DT}(t) - DT(t)\right)^{2} \\ &&+ \lambda^{2} \cdot \sum\limits_{t=1}^{T} t \cdot \left( \widetilde{DD}(t) - DD(t)\right)^{2}, \end{array} $$

Where DT(t) is the historical detected cases for a region on day t, \(\widetilde {DT}(t)\) the predicted detected cases from DELPHI, and similarly for DD(t) and \(\widetilde {DD}(t)\) corresponding to deaths. The historical data for the number of cases and deaths per US county is taken from the New York Times database [36]. The lambda factor \(\lambda = \min \limits \Big \{ \frac {DT(T)}{3 \cdot DD(T)}, 10\Big \}\) balances the fitting between detected cases and deaths; this re-scaling coefficient was obtained experimentally. Each state is included as soon as it records more than 100 cases so that isolated outbreaks are ignored. Every day as new data becomes available, we reoptimize the parameters while restricting the optimization range to be within 10% of the current fitted parameters to ensure a smooth drift. Further details on the fitting procedure are contained in the Appendix.

4.1.2 Validation

DELPHI was created in late March and has been continuously updated to reflect new observed data. Figure 5a shows our projections made on three different dates, and compares them against historical observations. This plot focuses on the number of cases, but a similar plot for the number of deaths is reported in the Appendix.

In addition to providing aggregate validation figures, we also evaluate the model’s out-of-sample performance quantitatively, using a backtesting procedure. To our knowledge, this represents the first attempt to assess the predictive performance of COVID-19 projections. Specifically, we fit the model’s parameters using data up to April 27, build projections from April 28 to May 12, and evaluate the resulting Mean Absolute Percentage Error (MAPE). Figure 5b reports the results in each US state.

Finally, we compare the predictions from the DELPHI model to the one reported by benchmark models. For obvious reasons, we restrict our attention to models that were available early on in the pandemic, and for which historical predictions are still publicly available. Specifically, we consider the models from the Los Alamos National Laboratory (LANL) [27] and Columbia University (CU) [39]. Figure 5c reports the Mean Absolute Percentage Error (MAPE) in the number of cases (averaged over all US states) from the projections made at two different points in time (April 03 and April 17, 2020) for different planning horizons (1, 2, 3, 4 weeks).

4.1.3 Discussion and impact

Results suggest that DELPHI-pred achieves strong predictive performance. The model has been consistently predicting, with high accuracy the overall spread of the disease for several weeks. Notably, DELPHI-pred was able to anticipate, as early as April 3rd, the dynamics of the pandemic in the United States up to mid-May. At a time where 200,000–300,000 cases were reported, the model was predicting 1.2M–1.4M cases by mid-May—a prediction that proved accurate 40 days later. Note that, at the time, the DELPHI model underpredicted the spread of the pandemic in May, as compared to ex-post data. This is due to the fact that early predictions were based on limited data and limited visibility into subsequent governmental policies. As the pandemic progressed, the DELPHI model has been able to address these limitations.

Our quantitative results confirm the visual evidence. The MAPE is small across US states. The median MAPE is 8.5% for the number of cases—the 10% and 90% percentiles are equal to 1.9% and 16.7%. The median MAPE is 7.8% for the number of deaths—the 10% and 90% percentiles are equal to 3.3% and 25.1%. Given the high level of uncertainty and variability in the disease’s spread, this level of accuracy is suggestive of excellent out-of-sample performance.

This behavior is further confirmed when we compare our MAPE to the two benchmark models. For almost all targets, DELPHI outperforms both models, resulting in lower MAPE—especially in the earliest phases of the pandemic when data remained scarcely available.

As Fig. 5b shows, a limitation of our model is that the relative error remains large for a small minority of US states. These discrepancies stem from two main reasons. First, errors are typically larger for states that have recorded few cases (WY) or few deaths (AK, KS, NE). Like all SEIR-derived models, DELPHI performs better on large populations. Moreover, the MAPE metric emphasizes errors on smaller population counts. Second, our model is fitted at the state level, implicitly assuming that the spread of the pandemic is independent from one state to another—thus ignoring inter-state travel. This limitation helps explain the above-median error in a few heartland states which were confronted with the pandemic in later stages (MN, TN, IA).

Fig. 5
figure 5

Projection accuracy for the United States

In summary, DELPHI-pred is a novel epidemiological model of the pandemic that has provided high-quality estimates of the daily number of cases and deaths per US state since the early pandemic period. This model is one of the top 5 consistent contributors to the forecasts used by the US Center for Disease Control to chart and anticipate the spread of the pandemic [42]. It has also been used by the Hartford Healthcare system—the major hospital system in Connecticut, US—to plan its ICU capacity, and by Janssen Pharmaceuticals to design a vaccine distribution strategy for their leading candidate Ad26.Cov2.S.

As the pandemic continued to spread throughout the world, governments have enacted further measures, including massive testing and contact-tracing efforts, while clinical treatments have substantially improved. Therefore, some of the assumptions included in the original DELPHI model (such as a time-invariant detection and mortality rates) have become increasingly untenable. Moreover, some of the quantities estimated through the clinical databases have become obsolete as the standard of care has improved. Finally, the behavior of the population in response to governments’ policies has also varied over time. In response, we have since significantly updated the structure of the DELPHI model and its empirical calibration, in order to incorporate these dynamics 〈REF〉.Footnote 4

4.2 DELPHI-presc: toward re-opening society

To inform the relaxation of social distancing policies, we link policies to the infection rate using machine learning. Specifically, we predict the values of γ(t), obtained from the fitting procedure of DELPHI-pred. For simplicity and interpretability, we consider a simple model based on regression trees [5] and restrict the independent variables to the policies in place. We classify policies based on whether they restrict mass gatherings, school , travel and work activities. We group travel and work restrictions together as ”other” restrictions as they are most often implemented simultaneously. Then, using such grouping, we define a set of seven mutually exclusive and collectively exhaustive policies observed in the US data: (i) No measure; (ii) Restrict mass gatherings; (iii) Restrict others; (iv) Authorize schools, restrict mass gatherings and others; (v) Restrict mass gatherings and schools; (vi) Restrict mass gatherings, schools and others; and (vii) Stay-at-home. The remaining possible policies were not implemented. The classifier was trained with policy data up till end of April so that the assumption of increasing governmental intervention is still valid. The regression tree is in the Supplementary materials, obtained from state-level data in the United States (here states include District of Columbia). This model achieves an out-of-sample R2 of 0.8, suggesting a good fit to the data.

The key results for various policies are summarized in Table 5. Policy (ii) and Policy (iv) are not shown in the table as these policies were not collectively implemented sufficiently to derive meaningful statistical inference. “States” record the number of states that had implemented the policy and “State-Days” record the total implemented days across states. The residual infection rate is calculated by normalizing the γ(t) under no policy to be 100%, while the standard errors are given assuming each state as an independent sample. We observe that tighter restrictions do indeed generate larger reductions in the residual infection rates. Furthermore, the results also provide comparisons between various policies—for instance, although Other (travel and work) restrictions seems to be able to effectively reduce the infection rate by 33.2 ± 5.9% when applied alone, its incremental effect after mass gathering and school restrictions have been implemented is insignificant (5.7 ± 10.2%). This seems to imply a large overlap between the behavior change caused by mass gathering and school restrictions with Other restrictions, so that once mass gathering and school restrictions are implemented, the effect of further limiting travel and work is severely limited. These quantitative results allow us to predict the value of γ(t) as a function of the policies (see Appendix for details), and simulate the spread of the disease as states progressively loosen social distancing policies.

Table 5 Implementation length and effect of each policy category as implemented in the US

Figure 6a plots the projected case count in the state of New York (NY), for different policies (we report a similar plot for the death count in the Appendix). Note that the stringency of the policies has a significant impact on the pandemic’s spread and ultimate toll. For instance, relaxing all social distancing policies on May 12 can increase the cumulative number of cases in NY by up to 25% by September.

Fig. 6
figure 6

Reopening scenarios for New York

Using a similar nomenclature, Fig. 6b shows the case count if all social distancing policies are relaxed on May 12 vs. May 26. Note that the timing of the policies also has a strong impact: a two-week delay in re-opening society can greatly reduce a resurgence in NY.

The road back to a new normal is not straightforward: results suggest that the disease’s spread is highly sensitive to both the intensity and the timing of social distancing policies. As governments grapple with an evolving pandemic, DELPHI-presc can be a useful tool to explore alternative scenarios and ensure that critical decisions are supported with data. To further understand the disparate impact of the policies across states, we made predictions for the situation across the US assuming a policy that involves mass gathering, travel, and work restrictions was implemented in all states on June 16th. Figure 7 represents, for many states in the US, the average weekly prevalence per 100K people in the first two weeks of July against the total percentage of detected cases among the population. One can observe three distinct clusters of states:

  • States with a small number of cumulative cases (relative to the population), and that are in a late stage of the pandemic with relatively few new cases, such as California, Florida, Texas or West Virginia.

  • States with a large number of cumulative cases, but that are in a late stage of the pandemic, with relatively few new cases, such as Connecticut, Louisiana, Massachusetts or New York.

  • States where the pandemic has had a large impact with a large number of cumulative cases, and where the situation will still be worsening at an alarming rate. These include states like Illinois, Minnesota, Iowa and Virginia. While the worst-case scenario shows a maximum of 2-3% of the population being infected, this suggests that for these particular states, such hypothetical policy could be inadequate for controlling the epidemic, and a stronger policy (such as Stay-at-Home orders) is needed for a little more time.

Fig. 7
figure 7

United States predictions for mid-July under mass gathering, travel and work restrictions

5 Prescriptive analytics: ventilator allocation

COVID-19 is primarily an acute respiratory disease. The World Health Organization recommends that patients with oxygen saturation levels below 93% receive respiratory support [52]. Following the standard Acute Respiratory Distress Syndrome protocol, COVID-19 patients are initially put in the prone position and then put in a drug-induced paralysis via a neuromuscular blockade to prevent lung injury [10]. Patients are then put on a ventilator, which delivers high concentrations of oxygen while removing carbon dioxide [3]. Early evidence suggests that ventilator intubation reduces the risk of hypoxia for COVID-19 patients [34].

As a result, hospitals have been facing ventilator shortages worldwide [41]. Still, local shortages do not necessarily imply global shortages. For instance, in April 2020, the total supply of ventilators in the United States exceeded the projected demand from COVID-19 patients. Ventilator shortages could thus be alleviated by pooling the supply, i.e., by strategically allocating the surge supply of ventilators from the federal government and facilitating inter-state transfers of ventilators.

We propose an optimization model to support the allocation of ventilators in a semi-collaborative setting where resources can be shared both between healthcare facilities or through a central authority. Based on its primary motivation, we formulate the model to support the management of the federal supply of ventilators and inter-state ventilator transfers in the United States. A similar model has also been used to support inter-hospital transfers of ventilators. This model leverages the demand projections from DELPHI-pred (Section 4) to prescribe resource allocation recommendations—with the ultimate goal of alleviating the health impact of the pandemic.

5.1 Model

Resource allocation is critical when clinical care depends on scarce equipment, and the COVID-19 has therefore sparked renewed interest in resource allocation problems [37]. In particular, several studies have used optimization to support ventilator pooling. In the context of influenza planning, [19] proposed a time-independent model for stockpiling ventilators. For COVID-19, [33] developed a time-dependent stochastic optimization model to support transfers to and from the federal government, given scenarios regarding the pandemic’s spread. Additionally, [1] and [4] proposed simple network optimization models to evaluate policy scenarios in ventilator sharing. Optimization can also be used to improve patient-level ventilator allocation decisions: for instance, [14] proposed a simulation model to compare the efficacy of respiratory medical interventions with different invasiveness levels. In this section, we propose a deterministic time-dependent ventilator sharing model. As compared to the earlier literature, our model optimizes both the management of the federal stockpile as well as inter-state transfers; allows each state’s fraction of pooled ventilators to vary continuously over time as a function of the underlying dynamics of the pandemic; allows ventilators to be shared proactively ahead of a state’s peak; and, most importantly, leverages the projections from DELPHI-pred as inputs, thus bridging the gap from predictive to prescriptive analytics.

We model ventilator pooling as a multi-period resource allocation over S states and D days. The model takes as input ventilator demand in state s and day d, denoted as vs,d, as well as parameters capturing the surge supply from the federal government and the extent of inter-state collaboration. We formulate an optimization problem where the key decisions are the number of ventilators transferred from state s to state \(s^{\prime }\) on day d, and the number of ventilators allocated from the federal government to state s on day d. The problem has a multi-period network flow structure with lead times, with additional problem-specific constraints: for instance, we do not allow states currently facing shortages to ship out ventilators.

We propose a bi-objective formulation. The first objective is to minimize ventilator-day shortages; for robustness, we consider both projected shortages (based on demand forecasts) and worst-case shortages (including a buffer in the demand estimates). The second objective is to minimize inter-state transfers, to limit the operational and political costs of inter-state coordination. Mixed-integer optimization provides modeling flexibility to capture spatial-temporal dynamics and the trade-offs between these various objectives. We report the full mathematical formulation of the model, along with the key assumptions, in the Appendix.

5.2 Results

We implemented the model on April 15, a time of pressing ventilator need in the United States. We estimate the number of hospitalizations from DELPHI-pred as the sum of DHR and DHD. From our clinical outcomes database in Section 2, we estimate that 25% of hospitalized patients are put on a ventilator, which we use to estimate the demand for ventilators. We also obtain the average length of stay from our clinical outcomes database (Table 2).

Figure 8a shows the evolution of ventilator shortages with and without ventilator transfers from the federal government and inter-state transfers. These results indicate that ventilator pooling can rapidly eliminate all ventilator shortages. Figure 8c shows ventilator transfers recommended in the US Northeast on April 15 (with inter-state transfers only), overlaid on a map displaying the predicted shortage without transfers.

Fig. 8
figure 8

The edge of optimization to eliminate ventilator shortages

There are different pathways toward eliminating ventilator shortages. Figure 8b shows the trade-off between shortages and transfer distance—each line corresponds to the maximal fraction of its own ventilators that each state can pool. Overall, states do not have to share more than 10% of their supply at any time to efficiently eliminate shortages. States can largely meet their needs with help from neighboring states, with cross-country transfers only used as a last resort. Broadly, results underscore trade-offs between ventilator shortages, the extent of inter-state transfers, the number of ventilators allocated from the federal government, and the robustness of the solution.

For instance, Fig. 9 shows the Pareto frontier between the model’s two objectives, inter-state transfer distance and ventilator shortage, as a function of two model parameters: α (capturing the demand buffer that states would like to plan for) and a surge supply multiplier (capturing by how much the federal government’s ventilator supply varies from our estimates). Note that the buffer α does not impact the number of inter-state transfers and the amount of ventilator shortages too significantly, suggesting that the solution can be made robust at a limited overall cost. In contrast, as the surge supply is decreased, the number of required ventilator transfers and the amount of ventilator shortages increase—highlighting the need for stronger cooperation between states as federal supply drops.

Fig. 9
figure 9

Influence of additional buffer and federal surge availability on ventilator shortages and transfers

5.3 Discussion and impact

Our main insight is that ventilator shortages could be eliminated altogether through inter-state transfers and strategic management of the federal supply. Results also underscore (i) the benefits of inter-state coordination and (ii) the benefits of early coordination. First, ventilator shortages can be eliminated through inter-state transfers alone: leveraging a surge supply from the federal government is not required, though it may reduce inter-state transfers. Under our recommendation, the most pronounced transfers occur from states facing no shortages (Ohio, Pennsylvania, and North Carolina) to states facing strong shortages (New York, New Jersey). Second, most transfers occur in the early stages of the pandemic. This underscores the benefits of leveraging a predictive model like DELPHI-pred to align the ventilator supply with demand projections as early as possible.

We have developed a similar model to support the re-distribution of ventilators across hospitals within the Hartford HealthCare system in Connecticut—using county-level forecasts of ventilator demand obtained from DELPHI-pred. This model was used by Hartford HealthCare to align ventilator supply with projected demand at a time where the pandemic was on the rise.

Looking ahead, the model provides a methodology to recommend allocations of critical resources in the next phases of the pandemic, from ventilators to drugs to personal protective equipment. Since epidemics do not peak in each state at the same time, states whose infection peak has already passed or lies weeks ahead can help other states facing immediate shortages at little costs to their constituents. Inter-state transfers of ventilators occurred in isolated fashion through April 2020; our model proposes an automated decision-making tool to support these decisions systematically. As our results show, proactive coordination and resource pooling can significantly reduce shortages—thus increasing the number of patients that can be treated without resorting to extreme clinical recourse with side effects (such as splitting ventilators). Other ventilator sharing studies, including [1], [4], and [33], come to similar conclusions despite differences in modeling assumptions. Though the exact nature of the best ventilator sharing policy can be debated, our results confirm that even simple collaboration policies can alleviate, or even eliminate, ventilator shortages.

6 Conclusion

This paper proposes a comprehensive data-driven approach to address several core challenges faced by healthcare providers and policy makers in the midst of the COVID-19 pandemic. We have gathered and aggregated data from hundreds of clinical studies, electronic health records, and census reports. We have developed descriptive, predictive and prescriptive models, combining methods from machine learning, epidemiology, and mixed-integer optimization. Results provide insights on the clinical aspects of the disease, on patients’ infection and mortality risks, on the dynamics of the pandemic, and on the levers that policy makers and healthcare providers can use to alleviate its toll. The models developed in this paper also yield decision support tools that have been deployed on our dedicated website and that are actively being used by several hospitals, companies and policy makers.