Generating simple classification rules to predict local surges in COVID-19 hospitalizations

Low rates of vaccination, emergence of novel variants of SARS-CoV-2, and increasing transmission relating to seasonal changes and relaxation of mitigation measures leave many US communities at risk for surges of COVID-19 that might strain hospital capacity, as in previous waves. The trajectories of COVID-19 hospitalizations differ across communities depending on their age distributions, vaccination coverage, cumulative incidence, and adoption of risk mitigating behaviors. Yet, existing predictive models of COVID-19 hospitalizations are almost exclusively focused on national- and state-level predictions. This leaves local policymakers in urgent need of tools that can provide early warnings about the possibility that COVID-19 hospitalizations may rise to levels that exceed local capacity. In this work, we develop a framework to generate simple classification rules to predict whether COVID-19 hospitalization will exceed the local hospitalization capacity within a 4- or 8-week period if no additional mitigating strategies are implemented during this time. This framework uses a simulation model of SARS-CoV-2 transmission and COVID-19 hospitalizations in the US to train classification decision trees that are robust to changes in the data-generating process and future uncertainties. These generated classification rules use real-time data related to hospital occupancy and new hospitalizations associated with COVID-19, and when available, genomic surveillance of SARS-CoV-2. We show that these classification rules present reasonable accuracy, sensitivity, and specificity (all ≥ 80%) in predicting local surges in hospitalizations under numerous simulated scenarios, which capture substantial uncertainties over the future trajectories of COVID-19. Our proposed classification rules are simple, visual, and straightforward to use in practice by local decision makers without the need to perform numerical computations. Supplementary Information The online version contains supplementary material available at 10.1007/s10729-023-09629-4.


S2.1 Simulation framework
To construct the model, we introduce the following notation: where Δ is the time-step of the simulation (e.g., Δ = 1 day). To generate epidemic trajectories for this model, we use Monte Carlo simulation to sample from this Markov chain using the following approach. Consider a particular compartment Z in which members depart due to events each of which is occurring at the rate , ∈ {1,2, … , } . For example, members of Susceptible compartment may leave due to 1) infection with the ancestral strain, 2) infection with the delta variant, 3) infection with the novel variant, or 4) vaccination (i.e., = 4) (see Fig. 2). If the number of individuals in compartment Z at time is ( ), then the number of individuals that leave this compartment due to events ∈ {1,2, … , } follows a multinomial distribution with total counts of ( ) and is the probability of not leaving the compartment Z during [ , + Δ ], and = ∑ =1 (1 − 0 ) is the probability of leaving the compartment Z during [ , + Δ ] due to the event ∈ {1,2, … , }.
To identify the new epidemic state at the next time step, we first sample from the multinomial distributions associated to each compartment and then use these realizations to calculate the new epidemic state given the current epidemic state. The events that drive the epidemic are represented by black arrows in Fig. 2. For example, the number of susceptibles in age group at time + Δ can be calculated as: ( + Δ ) = ( ) − new infections with ancestral strain in age group − new infections with the delta variant in age group − new infections with novel variant in age group − new vaccinations in age group + members losing infection-induced immunity in age group + members losing vaccine-induced immunity in age group .

S2.2 Rate of infection
For susceptible members in age group , we calculate the rate of infection with ancestral strain at time as: where ( ) is the transmission parameter for the ancestral strain at time , , is the daily rate at which an average individual in age group contact with individuals in age group (see below for how , is estimated), and 0,1 ∈ [0,1] is the ratio of infectiousness for a vaccinated individual to an unvaccinated one who are infected with the ancestral strain.
For susceptible members in age group , we calculate the rate of infection with the delta variant and a novel variant at time as: Where 1 ≥ 0 and 2 ≥ 0 are the ratio of infectiousness for the delta and the novel variant to the ancestral strain, and 1,1 ∈ [0,1] and 2,1 ∈ [0,1] are the ratio of infectiousness for a vaccinated individual who is infected with the delta variant and the novel variant to the unvaccinated individual who is infected with the ancestral strain.
For vaccinated members in age group , we calculate the rate of infection with variant ∈ {1,2,3} at time as: where ∈ [0,1] is the effectiveness of vaccination in providing immunity against infection with the variant .
To capture the effect of seasonal changes on the transmission of SARS-CoV-2, we allow the transmission parameter ( ) to vary over time according to: Here, 0 is the baseline transmissibility which is not influenced by the seasonality effect, and the parameter 1 represents the maximum magnitude of seasonality effect during a year. The phase parameter determines when the effect of seasonality reaches its maximum or minimum and is included to provide additional flexibility in the modeling of the seasonality effect. We determined that 0 , 1 , and by random draws from uniform distributions  3. If in estimating , , age group ∈ overlaps with multiple age groups from ̂ (say 1 ′ , 2 ′ , …), the daily contact , will be the average of 1 ′ , , 2 ′ , , …. For example,

S2.3 Daily contact rate
Following the approach of Medlock et al. [3], we then ensured that the number of contacts between age groups is symmetric (i.e. , = , , , where , is the size of age group ), by using ̅ , = 1 2 ( , + , , ). The contact rate matrix [ , ] used in our model is shown in Table S1.

S2.4 Effectiveness of control measures
We assume that control measures went into effect whenever the rate of hospital occupancy due to COVID-19 exceeded the threshold 1 and were lifted whenever this rate dropped below the second threshold 2 [4]. These thresholds differ across simulated trajectories and were determined by random draws from uniform distributions (1, 50) and (1, 50) per 100,000 population. We assume that the impact of control measures ( (ℎ)) in reducing the effective reproductive number varies as a function of hospital occupancy due to COVID-19 (ℎ) according to the sigmoid function (ℎ) = ̅ 1+ −4ℎ/ℎ ̅ , where ̅ is the maximum impact of control measures and ℎ ̅ is the maximum hospital capacity that could be allocated for COVID-19 patients (Fig. S3). We determined ̅ and ℎ ̅ by random draws from uniform distributions (50%, 85%) [4][5][6] and (5, 15) per 100,000 population [7].

S3 Training decision tree models
We used the scikit-learn package to train decision tree models [8]. Before training the models, we created a balanced dataset where the number of observations for which the hospital occupancy passed the specified threshold is equal to the number of observations for which this event didn't occur. To avoid overfitting, we used a minimal cost-complexity pruning approach [9], where we determined the complexity parameter using 10-fold cross-validation to maximize the model accuracy (defined as the fraction of correct predictions) [10]. For each decision tree model, we chose the optimal value of the complexity parameter from {0, 0.005, 0.01, 0.015, 0.02, …, 0.1}.       We note that the age groups in our model is different from the age groups of COVID Data Tracker. To estimate the vaccine coverage for age-group 5-12, we divide the total number of vaccinations by 2/3 the population size of in age groups <12. To estimate the vaccine coverage for age group 13-17, we divide the number of vaccinations in age group 12-15 and 16-17 by the population sizes of the age groups. To estimate the vaccine coverage for age group 18-29, we divided the total number of vaccinations in age group 18-24 and 1/3 in age group 25-39 by the same proportions of the population sizes of the age groups. To estimate the vaccine coverage for age group 30-49, we divided 2/3 of vaccinations in age group 25-39 and the total number in age group 40-49 by the same proportions of the population sizes of the age groups.        S1: Probability that an imported case is infected with the novel strain. We assumed that this probability increases according to ( ) = /( + − ( − ) ). The probability distributions assumed for , , and are listed in Table  S3. The vertical dotted lines marks September 1, 2021.  The probability distributions assumed for  , , , , and are listed in Table S5. The vertical dotted lines marks September 1, 2021.  Table S6.