Data Retrieval
We obtained data on cumulative hospitalizations and active hospitalizations (number hospitalized on a given day) from the data dashboard for the South Dakota Department of Health (SD DOH) - https://doh.sd.gov/news/Coronavirus.aspx#SD. Data for cumulative hospitalizations began on 2020-03-08 and were entered by hand into a .csv each day (SD DOH only reports totals for the current day, not a timeline). Data for active hospitalizations were not released until 2020-04-20, when 56 people were actively hospitalized. Beginning on that date, we also updated our .csv with active hospitalizations each day. Data were collected up to 2020-07-22.
When data collection began, most cases and hospitalizations were located in Minnehaha County, SD. Therefore, during data collection, we noted hospitalizations in Minnehaha County versus the rest of South Dakota to capture any potential divergent trends under the assumption that the disease would spread more slowly across rural South Dakota. For these two areas, data are only available for cumulative hospitalizations, not for active hospitalizations. Minnehaha County has a population density of 619 people per km2, which is > 50 times higher than the state average population density of 28 people per km2 [5]. Comparing these two areas allowed us to model COVID-19 hospitalizations in a rural and urban setting within the same state.
Models
We estimated cumulative hospitalizations using a Bayesian model in which hospitalizations were modeled as a sigmoid function of time using the Weibull function [11, 12]. The Weibull function is derived from the Weibull cumulative distribution [13] and has been used widely in biology to model growth curves [14]. We chose the Weibull function because it is more flexible than the logistic function and is asymmetric around the inflection point [11, 15]. We fit the Weibull function to two sets of data that describe (1) the cumulative hospitalizations for the state of South Dakota and (2) the cumulative hospitalizations for subgroups of Minnehaha County and the rest of South Dakota. Because the data were counts with positive outcomes, we used a Poisson likelihood with a log-link.
Model 1
$$y_{i} \sim Poisson(\lambda_{i})$$
$$log\lambda_{i} = \alpha_{i}\left( 1 - exp\left( -\frac{x_{i}}{\beta_{i}}\right)^{\gamma_{i}}\right)$$
$$\alpha \sim {\Gamma}(64,8)$$
$$\beta \sim {\Gamma}(2.9, 0.18)$$
$$\gamma \sim {\Gamma}(5.8,4.8)$$
With the above notation, yi is the cumulative number of people hospitalized in South Dakota on the i th date, α is the asymptote, β is the inflection point, and γ is the slope at the inflection point. Gamma priors were used because each parameter must be positive and continuous.
Informative prior distributions were derived from the cumulative hospitalization curve in New York City (NYC Department of Health, https://www1.nyc.gov/site/doh/covid/covid-19-data.page). We derived the priors from New York City because NYC had nearly completed its hospitalization curve when South Dakota’s was still beginning and because the data were available as a timeline (many states either have not reported temporal hospitalization data or have not made the data easily extractable).
To derive prior distributions for South Dakota, we first fit the aforementioned model to NYC’s hospitalization curve. Before fitting the model, we multiplied NYC hospitalizations by 0.10 to put them on the scale of South Dakota’s population (which is ˜10% of NYC’s population). We then fit the model to these adjusted hospitalizations using prior values of Γ(1.2,0.1) for α, Γ(0.25,0.005) for β, and Γ(1.4,0.3) for γ. Those reflect prior distributions with wide standard deviations that would represent a potential overload of South Dakota’s ˜ 2000 hospital beds: 10,000 ± 5000 (mean ± sd) for α, 50 ± 100 for β, and 100 ± 50 for γ (Table 1).
Table 1 Posterior distributions from the New York City model and prior distributions for the South Dakota models Model 2
To capture trends inside and outside of Minnehaha County, we fit the same model as before, but included an indicator variable with two levels (Minnehaha County or Outside Minnehaha County) for each of the three parameters.
$$y_{i} \sim Poisson(\lambda_{i})$$
$$log\lambda_{i} = \alpha_{j}\left( 1 - exp\left( -\frac{x_{i}}{\beta_{j}}\right)^{\gamma_{j}}\right)$$
$$\alpha_{j} = \alpha_{minn} + \alpha_{rest}r_{i}$$
$$ \beta_{j} = \beta_{minn} + \beta_{rest}r_{i} $$
$$ \gamma_{j} = \gamma_{minn} + \gamma_{rest}r_{i} $$
$$\alpha_{minn} \sim {\Gamma}(49,7)$$
$$\alpha_{rest} \sim N(0,1)$$
$$\beta_{minn} \sim {\Gamma}(2.9, 0.18)$$
$$\beta_{rest} \sim N(0, 5)$$
$$\gamma_{minn} \sim {\Gamma}(5.8, 4.8)$$
$$\gamma_{rest} \sim N(0,0.5)$$
With the above notation, yi is the cumulative number of people hospitalized in each i date (xi); αj,βj, and γj are the parameters for each j group (Minnehaha County or the Rest of South Dakota); Xminn are the priors for each X parameter (α, β, γ); Xrest are the priors for the difference in parameter values between Minnehaha and the rest of South Dakota; and ri is an indicator variable that is 0 if the data are in Minnehaha County and 1 otherwise.
As before, prior values were chosen from a combination of prior information from NYC and from prior predictive simulation [16]. To do this, we simulated 300 cumulative hospitalization curves with mean values for each parameter derived from the fit of the NYC model. Because NYC has both a higher absolute population size and a higher population density (by 10-fold) than South Dakota, we adjusted prior means and standard deviations so that the prior predictive distributions estimated hospitalizations to have a maximum that is slightly below the maximum of NYC, but with standard deviations that still include positive prior probability for some extreme predictions (e.g., 50,000 cumulative hospitalizations) (Table 1). Figure 1 shows the prior predictions for both models.
Markov Chain Monte Carlo
Each model aforementioned was specified in R (version 3.6.3; R Core Team 2020) using the brms package [17]. Posterior sampling was performed using Hamiltonian Monte Carlo in rstan (version 2.19.2,) [18]. We fit four chains, each with 2000 iterations, discarding the first 1000 iterations of each chain as warm-up. Warm-up samples are similar to burn-in sampling, but are used in this case as an optimizer for the HMC algorithm. Chains were checked for convergence using trace plots to assure overlap (Supplementary Information), and by ensuring that the Gelman-Rubin convergence diagnostic R̂ was < 1.1 [19].
Posterior Prediction
To forecast cumulative hospitalizations, we used the posterior predictive distributions from each model by first solving for the fitted values across each iteration of the posterior:
$$yfit_{i}^{(k)} = exp\left( \alpha_{i}^{(k)}\left( 1 - exp\left( -\frac{x_{i}}{\beta_{i}^{(k)}}\right)^{\gamma_{i}^{(k)}}\right)\right)$$
where k is the k th iteration from the posterior distribution and i is thei th date. Posterior predicted values were estimated by drawing each ypredi(k) from the Poisson distribution:
$$ypred_{i}^{(k)} = Poisson(yfit_{i}^{(k)})$$
We then summarized the mean, median, standard deviation and credible intervals (50 and 95%) across the posterior distribution of fitted and predicted values. For visualization, we plotted fitted values within the range of the data and predicted values beyond the range of the data.
Estimating Active Hospitalizations
To estimate active hospitalizations from the cumulative hospitalization curve, we first derived daily incidence ϕ for each iteration of yfiti(k) in which
$$\phi_{i}^{(k)} = yfit_{i}^{(k)} - yfit_{i-1}^{(k)}$$
We then summed incidence over the previous 5, 10, 12, or 15 days to estimate variable lengths of hospital stays:
$$\omega_{i}^{(k)} = \phi_{i}^{(k)} + \phi_{i-1}^{(k)} + \phi_{i-2}^{(k)}...\phi_{i-n}^{(k)}$$
where ωi(k) is the number of people actively hospitalized on the i th day for the k th iteration, ϕi(k) is the incidence on the i th day for the k th iteration and n is 5, 10, 12, or 15. These lengths of stay were chosen to capture the range of reported hospital stay lengths from the literature [20].
We then plotted these predictions against active hospitalizations reported by the South Dakota Department of Health. For the group levels (model 2), we performed the same calculations as above, but for each group. In addition, we also estimated the state-level hospitalizations from model 2 by summing the predictions from each group. This allowed us to compare predictions when only state-level data were available versus predictions with data available for different areas of the state.
Parameter Change Over Time
We re-fit the model each day as data were released. To visualize how parameter values changed over time as data were added, we plotted posterior predictions and parameter values from model runs in 20 day intervals. Twenty days was arbitrarily chosen to allow for visual clarity in the plots.
Iteration Sensitivity and Prior Sensitivity
To determine how sensitive results were to iterations and alternative prior specifications, we re-ran each model using 20,000 iterations and also altered the priors to make them either less informative or more informative by widening or tightening the standard deviations (Table S2). Increasing iterations had no impact on parameter estimates (Table S1). Results from alternative priors are presented in Figs. S1 and S2. The results in the main text were robust to alternative prior specifications. One exception were the tighter priors in model 2, in which the estimate of maximum hospitalizations was \(\sim \)130 patients lower than the estimate from the main model.