Effects of B.1.1.7 and B.1.351 on COVID-19 Dynamics: A Campus Reopening Study

The timing and sequence of safe campus reopening has remained the most controversial topic in higher education since the outbreak of the COVID-19 pandemic. By the end of March 2020, almost all colleges and universities in the United States had transitioned to an all online education and many institutions have not yet fully reopened to date. For a residential campus like Stanford University, the major challenge of reopening is to estimate the number of incoming infectious students at the first day of class. Here we learn the number of incoming infectious students using Bayesian inference and perform a series of retrospective and projective simulations to quantify the risk of campus reopening. We create a physics-based probabilistic model to infer the local reproduction dynamics for each state and adopt a network SEIR model to simulate the return of all undergraduates, broken down by their year of enrollment and state of origin. From these returning student populations, we predict the outbreak dynamics throughout the spring, summer, fall, and winter quarters using the inferred reproduction dynamics of Santa Clara County. We compare three different scenarios: the true outbreak dynamics under the wild-type SARS-CoV-2, and the hypothetical outbreak dynamics under the new COVID-19 variants B.1.1.7 and B.1.351 with 56% and 50% increased transmissibility. Our study reveals that even small changes in transmissibility can have an enormous impact on the overall case numbers. With no additional countermeasures, during the most affected quarter, the fall of 2020, there would have been 203 cases under baseline reproduction, compared to 4727 and 4256 cases for the B.1.1.7 and B.1.351 variants. Our results suggest that population mixing presents an increased risk for local outbreaks, especially with new and more infectious variants emerging across the globe. Tight outbreak control through mandatory quarantine and test-trace-isolate strategies will be critical in successfully managing these local outbreak dynamics.


Motivation
More than half a million cases of COVID-19 have been reported at United States colleges and universities since the beginning of the pandemic. Although many students, staff, and faculty have been immunized since vaccination became available in December 2020, more than 120,000 of all cases have occurred since January 2021 [21]. This is alarming, especially in view of the newly emerging variants of COVID-19 that are more infectious, and potentially more dangerous to the younger population [4]. American institutions of higher education remain faced with a difficult choice: maintaining campus life while exposing staff and students to a dangerous disease, or closing the campus at the risk of educational, mental, and financial disruption [19]. In most cases, university administrations soon realized that the risks of maintaining normal in-person learning during the pandemic were too grave and switched to an all-online instruction. Stanford University was the first major university to announce this transition on March 6, 2020. Within the following week, many colleges and universities sent home their students and by April 4, 2020, 1,400 university campuses had been closed [20]. While it became rapidly clear that campuses would not reopen for the remainder of the academic year, the next major question was to decide on possible reopening for the fall of 2020, especially in the light of the increase in prevalence among young adults. Indeed, young adults aged 20 through 29 years contributed substantially to the increase in COVID-19 infections across the southern United States during June 2020 [1].
One of the main risks associated with campus reopening is bringing students from high prevalence regions to campus and allowing them to mix with other students, possibly from low prevalence regions [9,17]. Despite various mitigating strategies including testing, quarantine, contact tracing, facemask usage, and disinfection [18,30], the risk of creating an uncontrollable outbreak and spreading the disease to a large part of the campus population remains exceptionally high as demonstrated by the cautious reopening followed by the rapid closing of the University of North Carolina at Chapel Hill in August 2020 [29]. Despite some initial hopes of inviting student back to campus for the winter term, Stanford University remained closed to the majority of its students until the end of winter 2020.
The objectives of this study are twofold: First, we perform a retrospective study to evaluate the risks that would have been associated with the reopening of Stanford University in the spring, summer, and fall of 2020, and winter of 2021. Our analysis accounts for the full regional diversity of the student population by tracking their origin states and each state's COVID-19 prevalence. We infer the critical parameters of the underlying network SEIR model using Bayesian analysis and estimate the number of students who would have been infected if the campus had fully reopened. Second, we complement our analysis by exploring the possible effect of variants on the overall disease dynamics.

Epidemiology Modeling
We model the local epidemiology of the COVID-19 outbreak using an SEIR model [2,6,14,23] with four compartments, the susceptible, exposed, infectious, and recovered populations, governed by the set of ordinary differential equations, Here (•) = d (•)∕d t denotes the time-derivative of the compartment (•) and N = S + E + I + R is the total population. Three parameters govern the transition from one compartment to the next: the contact rate , the latent rate , and the infectious rate . They are the inverses of the contact period B = 1∕ , the latent period A = 1∕ , and the infectious period C = 1∕ . To keep the number of parameters manageable, we assume that latent rate = 1∕2.5 days −1 and the infectious rate = 1∕6.5 days −1 are disease-specific for COVID-19, and constant in space and time [10,13,27]. To account for social behavioral changes during the course of the pandemic, we employ a dynamic contact rate = (t) that varies both in space and time [15,24]. For easier interpretation, we express the contact rate, in terms of the dynamic effective reproduction number (t) .
Here, we adopt a stochastic process approach to define (t) and construct a Gaussian process latent variable model. We assume a one-mean Gaussian process prior [25], which draws function values from a multivariate normal distribution. This normal distribution is parameterized in terms of the covariance function k [ t, t � ] and assumes that (t) is constant within a time window of five days. To account for a smooth non-linear mapping from the latent to the data space, we choose an exponentiated quadratic form of the covariance function as where 2 and 2 denote two kernel hyperparameters [7,12].

Mobility Modeling
We represent the disease dynamics in each state by its own local SEIR model and connect all 50 states to Stanford campus using a discrete mobility network [14]. Figure 1 shows that this network consists of n nd = 50 + 1 nodes, which represent the individual states, and n eg = 50 weighted edges, which represent the strength of their connection to Stanford campus. We approximate the weights of the edges by the number incoming students from each state and represent this information in the adjacency matrix A ij and in the degree matrix D II , The difference between the degree matrix D IJ and the adjacency matrix A IJ defines the weighted graph Laplacian L IJ , [3,14,23], Since we focus on the return to campus, we only simulate the mobility to Stanford campus and neglect the intrastate mobility between individual states. This implies that only a single row and column of the adjacency matrix and degree matrix are populated [16], A I1 = A 1J ≠ 0 and D 11 ≠ 0 , while all other entries are zero, A IJ = 0 for all I, J ≠ 1 and D II = 0 for all I ≠ 1 . We assume that all students arrive at the beginning of a given term rather than continuously over time. In this sense, we only use the network structure to calculate the influx of students at the first day of the term as an initial condition for the model.

Bayesian Inference
Our model local SEIR features four parameters: two parameters that define the initial exposed and infectious populations E 0 and I 0 , and two kernel-hyperparameters for the Gaussian process prior 2 and l 2 to model the dynamics of the effective reproduction number (t) . This implies that our network SEIR model introduces four model parameters for each state, We use Bayes' theorem to estimate the posterior probability distribution of the parameters , such that the statistics of the simulated daily new cases ΔI( , t) agree with the reported daily new cases ΔÎ( , t), Here P(Î(t) | ) is the likelihood, the conditional probability of the data Î (t) for given fixed parameters ; P( ) is the prior, the probability distribution of the model parameters ; P(Î(t)) is the marginal likelihood; and P( |Î(t)) is the posterior, the conditional probability of parameters for given data Î (t).
Likelihood The likelihood function evaluates the goodness of fit between the simulated new case numbers I( , t) = I( , t + 1) − I( , t) and the reported new case numbers Î (t) . For the individual likelihood functions L(Î(t i ) | ) , at each time point t i , we adopt a Student's t-distribution with a case-number-dependent width, .
We choose this distribution because it features heavy tails that makes the parameter inference more robust with respect to outliers [5,11]. Here, we assume a half Cauchy distribution for the likelihood width between the simulated and reported new cases I( , t) and Î (t), The product of all i = 0.., n likelihood functions L(Î(t i ) | ) , evaluated at the daily time points t i , defines the overall likeli- where the product symbol ∏ n i=0 denotes the multiplication of all i = 0.., n daily likelihoods across the time window of the simulation.
Priors. For the prior probability distributions P( ) , we select log-normal distributions for the initial exposed and infectious populations, and and the two kernel hyperparameters that define the exponentiated quadratic form of the covariance function for the Gaussian process model, a relatively weakly-informative half Cauchy prior that controls the amplitude and confines the distribution to positive values as and and the characteristic length-scale with a Gamma hyperprior, which we introduce for greater generality as Posteriors We evaluate Bayes' theorem (8) using Markov Chain Monte Carlo sampling [15,22,24] to obtain the posterior distribution P( |Î(t)) of the parameters = { E 0 , I 0 , 2 , 2 } in terms of the likelihood P(Î(t)| ) and prior probability distributions P( ) . We adopt the NO-U-Turn sampler [8] implemented in the Python package PyMC3 [26]. From the converged posterior distributions, we sample multiple combinations of parameters . From these posterior samples, we quantify and plot the means and (10) ∼ HalfCauchy( = 1) .

Fig. 1
Mobility modeling. Discrete graphs of the returning student network model with n nd = 51 nodes and n eg = n nd − 1 edges that represent the mobility of students returning to Stanford University campus 95% credible intervals of the reproduction number (t) and the simulated daily new cases I( , t) for all 50 states.

COVID-19 Outbreak and Mobility Data
For the COVID-19 outbreak data, we draw the COVID-19 history for all 50 states and Santa Clara County [28]. From these data, we extract the daily newly cases Î (t) . To eliminate weekday-weekend fluctuations, we smoothen the outbreak data by applying a moving averaging window of seven days. For the mobility data, we draw the student demographics of all Stanford University undergraduates enrolled in 2020, broken down by their origin state and their year of study, frosh, sophomore, junior, or senior year.

Campus Opening Forecast
We analyze the COVID-19 dynamics for campus opening at four different quarters and for the three currently most common virus variants. To model the effect of campus opening, we allow students to travel back to Stanford campus using our network SEIR model. We investigate campus opening for the spring quarter on April 6, 2020, the summer quarter on June 22, 2020, the fall quarter September 14, 2020 and the winter quarter January 11, 2021. For each quarter, we estimate the effects of opening by assuming the dynamic reproduction number (t) of Santa Clara County, that we infer directly from the local COVID-19 outbreak data from Santa Clara County. To model the effects of three different virus variants, in addition to the baseline simulation, we consider the B.1.1.7 variant with an increased transmissibility of 56% and the B.1.351 variant with an increased transmissibility of 50% [4]. For both virus variants, we scale our inferred effective reproduction number of Santa Clara by multiplication with the increased transmissibility and perform forward simulations for all four quarters, spring, summer, fall, and winter. Figure 2 illustrates the outbreak dynamics of COVID-19 in the United States from the beginning of the outbreak until January 17, 2021. For each state, the bottom graph represents the new confirmed cases I(t) as dots and our inferred dynamic SEIR model fit Î (t) as orange curves. The shaded regions represent our inferred 95% credible interval on the daily new cases. The top graph represents the inferred evolution of the effective reproduction number (t) . The solid lines describe the median values of (t) and the shaded areas represent the 95% credible intervals. The side-by-side comparison of all 50 states showcases the different disease dynamics and the varying timing of the COVID-19 waves in each state. While most states saw high effective reproduction numbers and a strong increase in new COVID-19 cases during the months of November and December 2020, the early outbreak dynamics displayed larger regional differences in waves between the individual states. These differences are of particular interest for informing the intrinsic risk that inviting students back to campus entails for a university campus like Stanford. For example, New York had a large outbreak at the beginning of the pandemic but managed to keep its effective reproduction numbers low until the end of the summer. In contrast, states like Arizona, California, Florida, Nevada, South Carolina and Texas saw a high regional COVID-19 prevalence by the end of June 2020, around the time Stanford's summer quarter was about to start. To invite undergraduates from all of the United States back to campus, we need to consider the epidemiological status of each states at the time a new quarter starts, which is showcased in Fig. 3. The left plot summarizes the effective reproduction number variation and the right plot showcases the exposed and infectious population sizes throughout all the states at the beginning of each quarter. Figure 3 shows that, at the beginning of the spring, summer, and fall quarters, most states across the United States saw imminent outbreaks with effective reproduction numbers well above one. At the same time, the relative sizes of the exposed and infectious populations were still relatively small. At the start of the winter quarter, the opposite situation occurred as most states were recovering from a new COVID-19 wave. Reproduction were well below one in most states, but the relative exposed E and infectious I population sizes were still significantly larger than at the beginning of the other quarters.

COVID-19 Outbreak Dynamics in the United States
For this study, these population sizes are of most interest, as they determine how many infectious students will return to campus. The spring, summer, fall, and winter quarter started on April 6, June 22, September 14, 2020 and Jan 11, 2021. Table 1 summarizes the infectious and exposed populations in each state for each of these potential campus reopening dates. On April 6, the largest exposed and infectious populations were 0.43% in New York, 0.34% in New Jersey, and 0.20% in Massachusetts. Consequently, for the spring quarter, students returning from the East Coast had the highest chances of bringing COVID-19 back to campus. On June 22, we inferred a 0.35%, 0.22% and 0.20% exposed and infectious population in Arizona, Florida, and South Carolina. On September 14, students returning from North Dakota, South Dakota, and Wisconsin had a 0.41%, 0.29% and 0.26% chance to be exposed or infectious. On Jan 11, peak exposed and infectious populations were recorded at 1.04%, 0.93%, and 0.84% in Arizona, California, and South Carolina. These numbers clearly show the significantly higher risk in inviting students back to campus at the beginning of the winter quarter: The average exposed and infectious populations across all states were 0.54% at the start of the winter quarter, compared to 0.13%, 0.08% and 0.06% for the fall, summer, and spring.   Percentages of infectious, and exposed populations for four consecutive quarters and number of 1st, 2nd, 3rd, 4th, and 5th year undergraduates originating from each state

COVID-19 Influx into Stanford Campus
The risk of bringing students back to campus depends on both the regional covid prevalence and the state-specific origin of the returning student population. The last five columns of Table 1 report the origin of Stanford University's current undergraduate population. Figure 4 showcases the inferred amount of exposed and infectious students returning to campus from each state. We estimate the COVID-19 influx by multiplying the state-specific prevalence with the number of students returning from each state, and summarize these values in Table 2. For a potential opening in the spring quarter, we inferred one COVID-19 exposed or infectious student returning from New York, one from California, and one from New Jersey. For a summer opening, our results suggest that we could have expected three exposed or infectious students returning from California and one exposed or infectious student from Texas and Florida. For a fall opening, we would have seen two exposed or infectious individuals returning from California and one from Texas. At the beginning of the winter quarter, the high COVID-19 prevalence throughout the United States would have resulted in 24 exposed or infectious students returning from within the state, three from Texas, two from New York, and one from Arizona, Florida, Georgia, Illinois, New Jersey, Massachusetts, Washington, Virginia, North Carolina, Maryland, and Colorado each. Figure 5 illustrates the local outbreak dynamics for Santa Clara County, home of Stanford University. The bottom graph depicts the reported daily new cases I(t) as dots and our inferred dynamic SEIR model fit Î (t) as orange curve. The shaded regions highlight the inferred 95% credible interval on the new confirmed cases. The top graph showcases the inferred temporal evolution of the effective reproduction number (t) in Santa Clara County, which serves as the input to our forward predictions in the following example. The solid lines describe the median values of (t) and the shaded area depicts the 95% credible interval. Similar to the state of California in Fig. 2, Santa Clara County saw two significant COVID-19 prevalence waves, the first at the beginning of July and the second from early November to the beginning of December. Assessment of new B.1.1.7 Figure 6 represents a scenario for the wild-type SARS-CoV-2, where no new COVID-19 virus variants are brought into campus by the returning students. In this scenario, the amount of new cases per day within the first 100 days after campus reopening remains limited to a maximum of two students if Stanford University reopened in the spring of 2020. For the same scenario, we would expect a maximum of three, six, and five cases per day after reopening campus for the summer, fall, and winter quarters. In total, reopening for spring, summer, or fall quarter would have resulted in 58 cases, 162 cases, and 203 cases during the first 100 days after reopening. For reopening campus at the beginning of the winter quarter, a total of 86 cases would have to be expected by March 3, 2021. Figure 7 represents the scenario where returning students bring in the new B.1.1.7 COVID variant which has an increased transmissibility of 56%. In this scenario, our Fig. 3 Epidemiological status of the incoming student population for four consecutive quarters. Effective reproduction statistics and size of the exposed and infectious populations at the beginning of the spring, summer, fall, and winter quarters forward predictions estimate significantly higher daily confirmed case numbers compared to the previous scenario. If students would have brought in the new B.

Conclusion
The COVID-19 pandemic remains an enormous challenge for higher education, especially for residential college campuses with a diverse student population from all across the country. Most institutions have cautiously limited student access to campus throughout the first year of the pandemic, but are now gradually inviting their students back to campus. In view of massive immunization campaigns, it seems natural to assume that it would now be safe to now transition back to in person instruction. However, since the early outbreak of the COVID-19 pandemic in December 2019, new and more contagious variants of the virus have emerged and COVID-19 is now beginning to spread predominantly into younger age groups. Taken together, these events bring back the question about safe campus opening.
In this study, we explore the effects of campus reopening for the example of Stanford University, a residential campus in Santa Clara County, California, with an undergraduate enrollment of 6479 students. For four consecutive quarters, spring, summer, fall, and winter, we virtually open Stanford campus and invite the undergraduate population from all 50 United States to return to campus under the current outbreak conditions of their home states. We use a Bayesian analysis to infer the disease parameters of each state using reported case data and extract the exposed and infectious populations at the first day of class for all four quarters. We find that 4, 7, 6, and 46 infected students would have brought the SARS-CoV-2 virus to campus if Stanford had opened in spring, summer, fall, and winter.
With these initial conditions, we perform a forward analysis and compare three scenarios: the true outbreak dynamics under the initial wild-type SARS-CoV-2, and the hypothetical outbreak dynamics under the new B.  Incoming infectious and exposed students from the United States at the beginning of the spring, summer, fall, and winter quarters   6 Forward prediction of new COVID cases for the wild-type SARS-CoV-2. Each simulation begins with the number of returning exposed and infectious cases on the first day of class and uses the inferred effective reproduction number (t) of Santa Clara County for the forward prediction throughout the entire quarter Each simulation begins with the number of returning exposed and infectious cases on the first day of class and uses the inferred effective reproduction number (t) of Santa Clara County, scaled by the 56% higher infectiousness of B.1.1.7, for the forward prediction throughout the entire quarter 4727 for B.1.1.7 and 4256 for B.1.351, recover from infection throughout the fall quarter to protect the community from further spreading, although the reproduction number remains well above one. While these case numbers are certainly only an upper limit, since we neglect additional countermeasures and test-trace-isolate strategies, they clearly highlight the effects of more infectious variants of the virus.
Infectious disease experts agree that it is highly likely that more infectious variants of the virus will emerge in the near future. Our study shows that, even if a new variant is only a few percent more contagious than the wild-type version of SARS-CoV-2, this can have tremendous consequences on the overall case numbers, and, ultimately, on the health care system. College campus administrators will need to maintain tight outbreak measures, quarantine, and isolation, to continue to successfully manage their local outbreak dynamics as more campuses reopen towards the fall. Each simulation begins with the number of returning exposed and infectious cases on the first day of class and uses the inferred effective reproduction number (t) of Santa Clara County, scaled by the 50% higher infectiousness of B.1.351, for the forward prediction throughout the entire quarter