Data
The data that we used to explore the relationship between international student enrollment and changes in higher education institutions’ reopening plans during July 2020 derived from three primary sources. First, daily information about institutional reopening plans for public and private not-for-profit 4-year institutions came from the College Crisis Initiative (C2i), a research initiative at Davidson College that has tracked institutional responses to the COVID-19 crisis since March 1, 2020 (for more information, see https://collegecrisis.org/). C2i began collecting data on reopening plans in June 2020, prior to the month of interest in this study. To collect these data, researchers at the College Crisis Initiative first manually checked all institutional websites to record their initial reopening plans on the first day of data collection. On subsequent days, a spider, a web crawler that automatically checks the written content on webpages, conducted a daily search of each institution’s website for changes in an extensive list of previously defined keywords that would signal a change in the institution’s reopening plan (e.g., “COVID-19,” “reopening,” “face-to-face,” “fall”). If this spider detected any change in the language of an institution’s website, a researcher then manually checked the website content and recorded any change in the institution’s reopening plan, marking the exact day that the institution’s instructional plan changed.
In this study, we use a panel dataset, with institutions represented 31 times, once for each day of July 2020—the peak of immigration uncertainties for international students in the US. Our dataset comprised 1968 institutions that were not already fully online before the pandemic, 723 public institutions and 1245 private not-for-profit (for a total of 61,008 institution-day observations).
Outcome variable
For our study, we were interested in a change in an institution’s reopening plan that involved a shift to more in-person instruction, our outcome variable. This binary indicator could be defined in several ways, such as a shift from fully online learning to a hybrid model of course delivery, a shift from a hybrid model to fully in-person course delivery, or a shift from no reopening plan to one that involved an in-person learning component (i.e., hybrid course delivery, primarily or fully in-person instruction). In other words, any shift in instructional delivery that involved a commitment to additional in-person teaching represented a shift that would facilitate international student enrollment.
Predictor of interest and covariates
Our predictor of interest, the logged percentage of total student enrollment comprised of international students, was derived from a second data source, the National Center for Education Statistics’ IPEDS (Integrated Postsecondary Education Data System) survey data. Unfortunately, IPEDS does not include a measure of international student enrollment specifically, but rather classifies international students in the broader category of non-resident students, a category that also includes non-US citizens who are in the US under a status that does not allow them to remain indefinitely (such as those under the Deferred Action for Childhood Arrivals (DACA) policy).Footnote 1 To this end, we take IPEDS’ non-resident category as a proxy for international student enrollment.
We additionally derived several covariates for our study from IPEDS, namely whether the institution offered graduate degrees (defined as Master’s, Doctoral, or advanced professional degrees), the region of the US where an institution was located, whether the institution was located in a rural area,Footnote 2 whether the institution had experienced a steep enrollment decline over the previous academic year (defined as a decline in enrollment of 10% or greater, following Kelchen, 2020),Footnote 3 the percentage of an institution’s total revenue that was tuition/fee revenue (often referred to as tuition reliance and used as a proxy for an institution’s financial precarity), and, in the case of public institutions, the percentage of total revenue consisting of state appropriations.Footnote 4 This information came from datasets representing the 2017–2018 academic year for financial variables and Fall 2018 for all other variables (and Fall 2017, in the case of our enrollment decline variable, which required two years of data to calculate).
Since these variables are collected annually for inclusion in IPEDS, they do not vary over time during the period of observation in this study (July 1–31, 2020) and were consequently classified as time-invariant covariates for analytic purposes. IPEDS was also our data source for determining if an institution was a public or private not-for-profit 4-year institution. As part of this study’s first dataset, C2i provided an additional time-invariant covariate representing whether the institution was in a state with a Republican (GOP) governor. We included this covariate to account for political pressure that an institution’s leader might feel to reopen campus for face-to-face instruction. Finally, one of our covariates that did vary over time during our period of observation was the daily cumulative COVID-19 case counts in the county where an institution of higher education was located. This information was taken from a third data source, the New York Times’ COVID-19 data files (for more information, see https://developer.nytimes.com/covid). These data files draw from state and local governments, as well as health departments, with the goal of providing a complete record of the COVID-19 outbreak.
We included in our dataset an additional variable accounting for an institution’s selectivity, defined as the number of applicants that it admitted, derived from IPEDS data. Because this variable was unreported for 173 private institutions in the dataset, we did not include this variable in our primary analyses, but rather used it in one of our robustness checks, described in the following section.
Taken together these covariates fall into two primary categories: those that potentially relate to an institution’s likelihood of reopening regardless of its financial precarity and possible reliance on international student enrollment and those that relate directly to the extent to which an institution is resource dependent. Covariates in this first category include whether the institution offered graduate degrees, GOP gubernatorial status, county-level COVID case counts, geographic region, and rurality of the institution. This latter variable in particular speaks to population density in an institution’s geographic location, which may have related to the spread of COVID-19. Variables in the second category include the percentage of revenue from tuition/fees, percentage of revenue from state appropriations (for public institutions), and whether the institution had experienced a sharp decrease in enrollment (defined as a 10% drop) in the past year.
Analysis
Because our outcome variable, a change in reopening plans that involved a switch from less to more in-person instruction, is one that happens over time, we estimated two primary sets of event history models, one for public institutions and another for private not-for-profit institutions. This analytic approach is used to explore events that unfold over time (in our case, a switch in instructional methods), can incorporate both time-variant and time-invariant covariates, and can be used to predict both if an event happens (at any point in time) while also accounting for when the event happens (Box-Steffensmeier & Jones, 2004; DesJardins, 2003). The result of an event history model is a hazard rate, representing the probability that an event will occur at a point in time, given that it has not happened yet (DesJardins, 2003).
In general, an event history model explores each point in time during which an event can occur, called the risk period, and then evaluates whether one of two outcomes happens: success or failure. In our case, this model explores each day of July 2020 and evaluates the relationship that our predictor of interest (the logged percentage of non-resident students enrolled at an institution), as well as our covariates, have on whether an institution changed instructional approach to incorporate more face-to-face learning (failure). If at a given point in time an observation fails (if an institution switches instructional approach), that observation exits the group of observations at risk of failure, the risk set, and is no longer considered in the model’s estimation. If the observation does not fail (no switch in reopening plans), then it remains in the risk set and continues to the next risk period.
We specifically used Cox proportional hazard models to analyze the relationship between the logged percentage of non-resident students and changes in reopening plans. Instead of assuming a functional form for the hazard rate, this approach to event history modeling relies on the data itself to predict it (Box-Steffensmeier & Jones, 2004). An additional advantage of Cox modeling is that it can accommodate multiple failures within the risk set at the same time, meaning that multiple institutions switching instructional methods on the same day is not an issue for the model. The probability of failure, the hazard rate \(h\left( t \right)\), is defined as in Eq. (1):
$$h\left( t \right) = Pr(T = t_{i} |T \ge t_{i} ,x)$$
(1)
Here, \(T\) represents time (in this study, days), and \(t_{i}\) is the specific day when institution i switched instructional strategy. In other words, this equation says that the hazard rate is the probability of failure (\(T = t_{i}\)) conditional on the institution’s having not failed previously (\(T \ge t_{i}\)), that is, conditional on the institution’s comprising part of the risk set. Covariates can be entered into this model as additional conditions, aside from membership in the risk set, represented as \(x\) in (1) (Box-Steffensmeier & Jones, 2004). For each group of institutions (public and private), we ran our Cox model twice, once with and once without covariates.
The hazard rate that a Cox model estimates is defined as a conditional logit, that is, the likelihood of an event occurring conditional on survival until the current risk period, as in Eq. (2):
$$Pr \left( {T \ge t_{i} ,x} \right) = exp \left( {\alpha_{t} + \beta x_{t} } \right) /\left( {1 + exp \left( {\alpha_{t} + \beta x_{t} } \right)} \right)$$
(2)
Here, \(\beta\) is a vector of coefficients corresponding to both the predictor of interest (in our case, the logged percentage of non-resident students enrolled at an institution) and covariates, both time-varying and time-invariant. \(\alpha_{t}\) represents a constant that can vary over time (DesJardins, 2003).
Robustness checks
In addition to these primary analyses, we conducted a series of robustness checks that acknowledge that, in addition to concerns about international student enrollment, higher education institutions were concerned about domestic student enrollment leading up to fall 2020, another financial consideration for institutional leaders. These additional analyses disaggregated total enrollment into international and domestic student enrollment and entered the log of each of these numbers into our Cox models separately. All other details of these analyses were the same as our primary analyses. The results of these analyses were strikingly similar to those of our primary analyses regarding international student enrollment, described in the following section. For this reason, we provide the results of these robustness checks in Appendix A but do not discuss them in detail here.
Finally, to answer our third research question, we further probed our models for private institutions specifically. As outlined previously, recent literature suggests that certain types of private not-for-profit institutions were in a more precarious financial situation (Kelchen, 2020; Taylor & Cantwell, 2019; Zemsky et al., 2020) even prior to the pandemic and thus might have been more susceptible to pressures to re-open for in-person instruction so as not to lose international student enrollment. For this reason, we divided private institutions into tertiles based on two key variables: tuition reliance (the percentage of total revenue comprising tuition and fees) and selectivity (the percentage of applicants that were admitted). This approach created three groups of roughly equal numbers based on these two variables, and we subsequently ran the same analyses (with covariates) described above on each group of institutions individually. This approach resulted in six additional models, three for each tuition reliance tertile and three for each selectivity tertile.