Introduction

Though the United States’ postsecondary education system is highly regarded, the country ranks poorly relative to other nations in terms of the number of students completing Science, Technology, Engineering, and Math (STEM) degrees (Goldin & Katz, 2009).Footnote 1 Among the many causes of low STEM attainment in the United States, the underrepresentation of racial and ethnic groups stands out as one of the most salient and intriguing challenges (Xie et al., 2015). Over the past 15 years, prominent scientific organizations, such as the National Academies (Vest, 2010), the President’s Council of Advisors on Sciences and Technology (Olson & Riordan, 2012), the Congressional Research Service (Kuenzi, 2008), and the National Science Foundation (2010), have called for an increased emphasis in STEM education by increasing the number of STEM graduates and eradicating barriers for underrepresented minority students in STEM. Although there are several obstacles towards increasing underrepresented minority student enrollment and successful completion of STEM degrees, this paper aims to ask, “How much have affirmative action bans decreased STEM college completions?” This study is not the first to pose such a question (Arcidiacono et al., 2016; Bleemer, 2020; Hill, 2017), but it is the first to attempt to piece together how model specification, period studied, and STEM classification are treated in different ways by various research studies and thus have resulted in contradictory findings in the literature.

Historical Background and Legal Context

This study is set amidst 40-year-long arguments about the legitimacy of affirmative action. Since Bakke (“Regents of the University of California v. Bakke,” 1978), a Supreme Court case about a White applicant to the UC Davis Medical School claiming reverse discrimination, several other Supreme Court cases have also challenged affirmative action in postsecondary education (“Fisher v. University of Texas,” 2013; “Fisher v. University of Texas,” 2016; “Gratz v. Bollinger,” 2003; “Grutter v. Bollinger”, 2003; “Schuette v. Coalition to Defend Affirmative Action”, 2014). At the heart of the debate is the question, “Who is college for?” and, more poignantly, “Who are the most highly selective colleges for?” The debate itself brings the interpretation of the Equal Protection Clause of the 14th Amendment into question. One group argues that the Equal Protection Clause was created to assist racial and ethnic minorities who had been historically disadvantaged and continue to be disadvantaged. In contrast, the other group contends that racial and ethnic minorities receive an unfair advantage, violating equal opportunity.

Specifically, in this paper and many of the Supreme Court cases, the term URM (underrepresented racial minority) references Blacks, Hispanics, and Native Americans. The acronym generally refers to an ethnoracial group that holds a smaller percentage within a significant subgroup (e.g., college enrollees) than the ethnoracial group holds in the general population (e.g., college-aged people typically 18–22 years old). In the Grutter v. Bollinger Supreme Court decision and the various concurring and dissenting opinions, the term “underrepresented” appears 41 times and twice in the four-page syllabus (i.e., summary). Although it was never formally held by the Court that higher education institutions were required to operate using this definition, Justice Thomas’s minority opinion stated, “… the Court clearly approves of only one use of race—the distinction between underrepresented minority applicants and those of all other races” (“Grutter v. Bollinger”, 2003) [footnote 12]. In response, several institutions clarified their definitions of “underrepresented.” For instance, on March 19, 2004, the Association of American Medical Colleges (AAMC) Executive Committee adopted a clarification to its definition of “underrepresented in medicine” (Association of American Medical Colleges, 2022). However, the term is also critiqued as denying groups the right to name themselves, obfuscating the different circumstances existing within the URM group (e.g., differential impacts between Blacks and Hispanics) and implying a racial hierarchy in which non-URMs dominate URMs (Bensimon, 2016; Lane, 2005; McNair et al., 2020; Walden et al., 2018). However, since the Court has previously favored this construct of race regarding affirmative action, it is likely that it will continue to do so.

The fate of affirmative action is still largely to be determined. The Bakke case was so indefinite that it is often referred to as judicial statesmanship because it provided numerous concessions to both sides (Dworkin, 1978). Its legacy partly explains why current litigious efforts against Harvard University and the University of North Carolina were able to rise to the Supreme Court (Murphy, 2019) with an expected ruling in 2023. As a result of the many Supreme Court decisions, principally Schuette v. Coalition to Defend Affirmative Action, the fate of affirmative action bans has become a state-level issue. The following states have all implemented state-level affirmative action bans at their public postsecondary institutions: CA 1997; TX 1997; WA 1999; FL 2001; MI 2007; NE 2009; AZ 2011; NH 2012; OK 2013; and ID 2020. However, Grutter v Bollinger overturned the 5th Circuit Court’s Hopwood v Texas (1996) decision, which allowed Texas to reimplement affirmative action in public postsecondary education. Implementing the bans in these states and not others ultimately provided the empirical context for a natural experiment detailed in the methods section of this study.

The Scholarly Debate Over the Merits of Affirmative Action for Underrepresented Minority Students in STEM

Not long after the first affirmative action case involving postsecondary education in 1978 (Bakke), Thomas Sowell hypothesized that although affirmative action had long been seen as beneficial to underrepresented racial and ethnic groups, affirmative action was paradoxically detrimental to them (Kerwick, 2011; Sowell, 1978, 1994). The a priori rationalization was that these underrepresented groups would be academically overmatched by their non-underrepresented counterparts in the postsecondary classroom. Thus, the underrepresented students would be disadvantaged when competing for course grades and other opportunities afforded on merit. Under this hypothesis, other scholars also supported the view that affirmative action is detrimental to underrepresented minority students (Thernstrom, 1994; Thernstrom & Thernstrom, 1998, 2004). The position also gained traction outside of academia and has been espoused by Justice Thomas, who frames aid, assistance, and support as entitlements that cause minority groups not to work as hard for their aspirations (“Gratz v. Bollinger,” 2003; “Grutter v. Bollinger”, 2003).

Although this mismatch position is popular among conservatives and some academics, the overwhelming empirical evidence finds that affirmative action is not only beneficial for underrepresented minority students (Alon, 2009, 2015; Alon & Tienda, 2005, 2007; Backes, 2012; Bowen & Bok, 2016; Hinrichs, 2012, 2014, 2020; Orfield, 2001), but that racial diversity is beneficial to the various groups that experience it, and the organizations that attain it (Guinier, 2015; Gurin & Nagda, 2006; Gurin et al., 2002, 2003, 2004a, 2004b; Hong & Page, 2004; Hurtado et al., 2003; Miller & Page, 2009; Page, 2008, 2010).

Scholars are now torn about how affirmative action bans impact the STEM degree attainment of URM students (Arcidiacono et al., 2016; Bleemer, 2020; Hill, 2017). Largely considered an afterthought, the study by Arcidiacono et al. (2016) found evidence supporting Sowell’s original hypothesis and has since reinvigorated this debate. Specifically, their study found that less academically prepared URMs at selective schools in California would have had higher graduation rates at less selective schools and hypothesized that affirmative action bans—which had not been in place during the time of their study—would have increased the STEM graduation rates of URM students. On the other hand, Bleemer (2020) rejects this hypothesis of affirmative action mismatch in STEM by finding empirical evidence, also from California, that less academically prepared URMs displaced from more highly selective schools were not more likely to graduate with STEM degrees at less selective schools. Similarly, Hill’s (2017) study builds upon earlier work that estimates the impact of affirmative action bans on college completion (Backes, 2012; Hinrichs, 2012, 2014), but he investigates the differing impacts of the bans on URM STEM and non-STEM completions explicitly. Hill finds declines in STEM and non-STEM degree attainment due to affirmative action bans. However, when comparing STEM and non-STEM degrees, Hill finds no STEM penalty: STEM is not impacted by affirmative action bans more than non-STEM.

This study investigates the STEM penalty by examining the taxonomy used, the period analyzed, and which tiers of postsecondary education are considered in these prior studies. To clarify how this analysis distinguishes itself from previous ones, Table 1 is included below.

Table 1 Characteristics of prior work on affirmative action and STEM

Table 1 shows the fundamental differences in research design across the previous studies of STEM and affirmative action bans. The first difference concerns which 4-year schools are analyzed. Most studies focus on the de jure targets of affirmative action bans- public schools. However, only Backes (2012) and Bleemer (2020) include an analysis of the bans on private and for-profit schools. Furthermore, the dominant strategy to measure school selectivity in these studies is Scholastic Aptitude Test (SAT) score deciles. Barron’s Admissions Competitiveness Index offers an alternative that considers a more holistic approach to selectivity because it is produced with other factors beyond SAT scores, such as an institution’s acceptance rate, student GPAs, and high school class rank. Additionally, since some schools do not report SAT scores to (Integrated Postsecondary Education Data System) IPEDS, nor to U.S. World and News Report (often because those schools do not require them for admission), many of the previous studies overlook less selective schools to some degree. However, this study compares both measures of selectivity to see how they impact the findings.

The most intriguing point is that every study described in Table 1 uses a different STEM taxonomy. As such, this study shows how different taxonomies have led to varying narratives about the impact of affirmative action bans on STEM college completions. The Department of Homeland Security (DHS) and National Science Foundation (NSF) STEM measures (detailed in the Online Appendices) differ in meaningful ways. Although both use CIP codes (Classification of Instructional Programs) from the National Center for Education Statistics, they use different formats of the CIP codes. The CIP codes have a two-, four-, and six-digit series. The DHS uses the six-digit series, while the NSF uses the four-digit series. Using more digits means the DHS measure can be more precise about what counts as STEM or non-STEM. For example, although the 01.03 four-digit series includes ten six-series codes, the DHS STEM taxonomy only has 1.0308 (Agroecology and Sustainable Agriculture). Furthermore, the DHS STEM taxonomy includes several codes sharing a two-digit series identifier that the NSF STEM taxonomies do not.Footnote 2 Variations at this level reflect a more fundamental difference in what counts as STEM and what does not. For example, the DHS measure classifies quantitative disciplines, fields, and subfields from the social sciences as STEM. Examples include Educational Statistics and Research Methods, Research and Experimental Psychology, and Economics and Quantitative Economics. These differences in measurement help explain why this study finds different narratives about the impact of affirmative action bans on STEM completions.

The Enduring Impacts of the Bans

In addition to exploring different studies’ taxonomies, this analysis also considers when the impacts of the bans were strongest. Some organizations have advocated for alternative ways to address diversity in race-neutral environments (Coleman et al., 2019). However, scholars have shown that using race-alternative measures, such as socioeconomic affirmative action, may reduce the racial gap in college enrollment and completion but do not restore racial diversity to levels achieved using race-based affirmative action (Alon, 2015; Long, 2016; Long & Bateman, 2020; Reardon et al., 2017). Moreover, only a handful of scholars have empirically investigated the long-run impacts of affirmative action bans. Mickey-Pabello and Garces (2018) find that the effects of affirmative action bans on URMs enrolling in medical school persisted but decreased in magnitude over time. Similarly, Long and Bateman (2020), while only investigating a handful of highly selective public schools, also find that the impact of affirmative action bans on 4-year degrees has been persistent over time.

Data and Methods

This study uses data from the Integrated Postsecondary Education Data System (IPEDS) and the Current Population Survey (CPS) from 1991 to 2016, taking advantage of rich state-level variation that captures the implementation years of the bans in each state. This time period maximizes the number of observations in the analytic sample and allows the inclusion of timing lags to scrutinize when effects became observable after a state’s ban went into effect using a difference-in-differences analytic design.Footnote 3 This study includes the years from 2012 to 2016, building on the undergraduate affirmative action studies pioneered by Hinrichs (2012) and Backes (2012) and making the right censor mark equivalent to Hill’s (2017) STEM study.

One of the aims of this study is to observe how affirmative action bans affected the attainment of STEM degrees by URM students and compare that result to URM non-STEM degree completion. Since the aim is to investigate if there is an additional STEM penalty for affirmative action, the dependent variables used in this study are the proportion of URM STEM graduates (i.e., the proportion of STEM graduates that are URMs) and the proportion of URM non-STEM graduates (i.e., the proportion of non-STEM graduates that are URMs) at each institution. This allows one to discern differences between the patterns for STEM and non-STEM college completions. The numerator is the number of URM STEM graduates (i.e., URM students who graduated with a STEM degree), and the denominator is the total number of STEM graduates (the same process is followed to produce the non-STEM proportion). This is the same as the dependent variable constructed by Hill (2017). The STEM/non-STEM distinction was based on the CIP codes used by the Department of Homeland Security (2016) to classify STEM students (See Online Appendix A).

The units of analysis for this study are 4-year bachelor’s degree-granting postsecondary institutions. Private schools are included because Backes (2012) and Hirschman and Berrey (2017) found evidence that there may be a spill-over effect of the bans from public schools onto private ones even though they are not the de jure targets of affirmative action bans. Another analysis for this paper is conducted using the National Science Foundation’s (2019) classification (See Online Appendix B). For the sensitivity analyses (e.g., including and excluding states, changing the years of the analysis, investigating the long-term impacts of the bans, and measuring selectivity with SAT scores), only the Department of Homeland Security’s (DHS) measure is used because it produces a smaller more conservative effect on URM STEM completions than the National Science Foundation’s (NSF) measure. These analyses explore the features in the preexisting research literature that may have produced results that indicate no STEM penalty (Hill, 2017) and decreased URM STEM completions (Arcidiacono et al., 2016) due to affirmative action. Furthermore, the DHS taxonomy more rigidly defines STEM; the NSF taxonomy includes Architecture as STEM but excludes Science Technologies and Technicians, and Health Professions and Related Programs.

Many state-level variables from the CPS (Current Population Survey) are included to control state-level effects on URM STEM and non-STEM attainment (see Table 2). Adjusting for state-level characteristics in a multivariate analysis addresses that states are not interchangeable units of analysis; they have unique features that may influence the investigation. In addition, if not enough covariates are included in the model, then the differences-in-differences estimator may be biased (a more detailed discussion of this and its related tables are included in the Online Appendix). The covariates are state-level racial demographics, which is a percentage variable for each population that is White, Black, Latino, Native American, Asian, and other; state-level educational attainment, as measured by the percentage of the population 25–34 years old with at least a bachelor’s degree; and state-level economic indicators, including the unemployment rate of the population most susceptible to the impact of the bans upon graduation (25- to 34-year-olds), and median personal income (also for 25- to 34-year-olds).

Table 2 State-level control variables for ban and non-ban states

Another set of variables accounts for the bachelor’s degree-granting institution, including enrollment size as a weight and not a covariate, public/private status, and selectivity. Enrollment size is used to weigh the variables in the analysis to reflect what the affirmative action bans mean with respect to a typical student, not what the affirmative action bans mean with respect to the typical school (which is what the unweighted analysis does). This distinction is clarified in Table 3, which presents both weighted and unweighted results. Colleges’ private or public status is coded as a binary variable where 0 indicates “private” and 1 means “public.”

Table 3 Number of institutions by Barron’s classification

Selectivity is classified by Barron’s Admissions Competitiveness Index with the categories of “Most Competitive,” “Highly Competitive,” “Very Competitive,” “Competitive,” “Less Competitive,” and “Noncompetitive.” As in previous studies, an additional sensitivity analysis uses SAT scores instead of the Barron’s index. The main results use the Barron’s data because they include a larger sample of schools since not all schools submit their SAT scores or require them for admission. Table 2 below shows the distribution of Barron’s index selectivity scores for public and private institutions across states with and without affirmative action bans. Because there were sometimes very few schools that fit into one category (e.g., there are only eight public schools in the “Most Competitive” group), and this can cause issues with statistical power, the categories “Most Competitive,” “Highly Competitive,” and “Very Competitive” were all recoded as “Highly Selective.” In this same vein, “Competitive,” “Less Competitive,” “Noncompetitive,” and “Special” were all recoded as “Less Selective.” Schools that Barron’s did not index were recoded as “Unclassified.”

Table 3 shows that the 2344 schools included in this study are a far greater and more representative number of institutions than in other studies. Hinrichs (2014) analyzes an unknown number of public schools (N = 12,274 institution years). Backes (2012) analyzes 526 public schools and 890 private schools. Hill (2017) analyzes 402 public schools. Arcidiacono et al. (2016) and Bleemer (2020) only analyze schools in California. However, sensitivity analyses are used in this study to investigate the congruence with these prior studies. For example, limiting the sample of banned states to California and Texas makes it possible to investigate the congruence between this work and that of Arcidiacono et al. and Bleemer, even though this analysis is not limited to California alone (as in the work of Arcidiacono et al. and Bleemer). The modeling strategy for this work is based on institutions. Thus, this analysis requires an additional state (in this case, Texas) to quell concerns about over-rejecting the null hypothesis that the difference-in-difference estimator equals zero (Carter et al., 2017). Performing these sensitivity analyses demonstrates how the model specification changes the results and still provides evidence that the findings are congruent to the aforementioned studies when the model specifications are similar to those used by those studies.

Analytic and Estimation Strategy: Difference-in-Differences-in-Differences

A difference-in-differences strategy is used to estimate the impact of affirmative action bans on the proportion of URM STEM graduates (i.e., the proportion of STEM graduates that are URMs). This strategy has been used by prominent studies that examine the impact of policy changes on educational outcomes (Dynarski, 2004; Long, 2004) and by scholars who study bans on affirmative action (Backes, 2012; Garces, 2013; Garces & Mickey-Pabello, 2015; Hill, 2017; Hinrichs, 2012). The “first difference” of this strategy compares the proportion of URM STEM graduates before and after an affirmative action ban was implemented to determine whether changes are associated with the ban’s start. If the affirmative action ban impacted the proportion of URM STEM graduates, there would be a decline after the policy went into effect. However, because the proportion of URM STEM graduates may differ yearly for other reasons (e.g., period effects, changes in demographics, or labor market conditions), this first difference may also reflect these other changes. Thus, a “second difference” is used to capture any external trends which take advantage of a comparison group of people who lived in states where affirmative action bans were not implemented. Among people in states that did not prohibit affirmative action in a particular period, changes in STEM graduation over the same period are attributed to underlying trends rather than to the affirmative action bans. After subtracting the second difference from the first, an estimate of the causal impact of affirmative action bans on the proportion of URM STEM graduation remains.

A differences-in-differences estimation strategy is estimated in a multilevel regression framework using fixed effects to account for the data’s hierarchical nature (observations over time, nested within states) (Murnane & Willett, 2011). Furthermore, to account for the standard errors that are often poorly estimated in difference-in-differences models (i.e., not as well estimated by state fixed effects alone or robust standard errors), state-clustered standard errors are used to provide a more conservative estimate for the impact of affirmative action bans (Bertrand et al., 2004).

The following multilevel ordinary least squares regression was fitted (the levels are colleges and states [s]):

$$STEM_{ist + 4} = \beta_{0} + \beta_{1} \left( {BAN_{st + 4} } \right) + \beta_{2} W_{st + 4} + \beta_{3} I_{it + 4} + \beta_{4} cyear_{t + 4} + \gamma S_{s} + nScyear_{st + 4} + \alpha Z_{t + 4} + \varepsilon_{ist + 4}$$
(1)

where \(STEM_{ist}\) indicates the proportion of STEM (or non-STEM) degrees awarded. BANst is a dichotomous variable indicating whether a state (s) had an affirmative action ban in place in year (t). Here year t, when the ban was implemented, is lagged by 4 years (+ 4) because the first-year college students that were first impacted by affirmative action typically need at least 4 years to graduate. This is essential because it is unlikely that changes in the proportion of URM STEM graduates would change immediately after the bans. \(W_{st}\) represents a matrix of selected time-varying state characteristics, such as state-level racial demographics (i.e., percentage of population that is White, Black, Latino, Native American, or other), state-level educational attainment (i.e., the percentage of the population 25–34 years old with at least a bachelor’s degree) and state-level economic indicators, including the unemployment rate of the population most susceptible to the impact of the bans (25- to 34-year-olds).

These state-level characteristics are used to control for state-level characteristics.Footnote 4\(I_{it}\) (not to be confused with the identity matrix) represents a matrix of institutional characteristics, including enrollment size, percentage of students receiving financial aid, institutional selectivity, and cost of tuition. \(S_{s}\) indicates a set of vectors to distinguish among the states and controls for all time-invariant differences—both observed and unobserved—among the states (state fixed effects). \(cyear_{t}\) represents a continuous-year variable (coded so that 1991 = 1, 1992 = 2, 1993 = 3, etc.).\(Scyear_{st}\) represents a complete set of two-way interactions between each state dummy and a continuous predictor representing the linear effect of year. \(Z_{t}\) represents a set of vectors for years to distinguish among the chronological years to which the bans apply and to account for average differences in the outcome across the chronological years covered in the data (year fixed effects), which include the years 1991 to 2016. \(\varepsilon_{ist}\) represents the residual. Fixed effects and linear trends were used because fixed effects capture the year-specific changes and national trends, and linear trends capture state-specific trends. Neither created a collinearity issue that may sometimes occur with these models. Because of the presence of the state and year fixed effects, β1 provides the required difference-in-differences estimate of the impact affirmative action bans have had on the proportion of URM STEM college completions (i.e., the proportion of STEM graduates who are underrepresented minorities).

Difference-in-differences-in-differences coefficients (Gruber, 1994) are also used to model the heterogeneity in the influence of affirmative action bans produced by school selectivity. While scholars often use triple differencing (DDD) as a robustness check to ensure that certain variables (or placebos) are not influencing the difference-in-difference estimator (Chetty et al., 2009; Ravallion et al., 2005), they are used here to investigate the heterogeneity of the impact of affirmative action bans by school selectivity. These differences are essential because affirmative action bans are known to powerfully impact the most highly competitive schools and other schools less or not at all. This third difference subtracts the difference-in-differences result for the baseline group (“most competitive”) from other groups classified by selectivity (e.g., “competitive”).

The difference-in-differences-in-differences estimate is specified:

$$STEM_{ist + 4} = \beta_{0} + \beta_{1} \left( {BAN_{st + 4} } \right) + \delta \left( {BAN_{st + 4} Selectivity_{i} } \right) + \beta_{2} W_{st + 4} + \beta_{3} cyear_{t} + \beta_{4} I_{it} + \gamma S_{s} + n_{s} Scyear_{st + 4} + \alpha Z_{t + 4} + \varepsilon_{ist + 4}$$
(2)

where \(\delta\) is introduced as the difference-in-differences-in-differences estimator for an interaction between a school’s selectivity (\(Selectivity_{i}\)) and a ban on being present in a state given a particular year \(\left( {BAN_{st + 4} } \right)\). The \(Selectivity_{i}\) component of the interaction used to produce \(\delta\) is also present within I, representing a matrix of institutional characteristics.

Lastly, to investigate if the ban was most impactful shortly after or long after the bans were implemented, as Mickey-Pabello and Garces (2018) have done, purposeful right-censoring of the ban states is used to determine when the impact of the bans was strongest. In the formula below, k represents the number of influential years of the bans. Restricting the ban states to just 2 years of influence (k) is calculated using 6 years after implementation minus a 4-year lag to allow students to graduate. Sensitivity analyses such as these are generally required when using difference-in-differences and difference-in-differences-in-differences because they are quasi-experimental methods subject to alternative explanations that need to be considered or ruled out (Shadish et al., 2002).

$$STEM_{isk} = \beta_{0} + \beta_{1} \left( {BAN_{sk} } \right) + \beta_{2} W_{sk} + \beta_{3} I_{ik} + \beta_{4} cyear_{k} + \gamma S_{s} + nScyear_{sk} + \alpha Z_{k} + \varepsilon_{isk}$$
(3)

Findings

The main findings for this paper using the DHS and the NSF taxonomies indicate that the share of URM STEM college completion has decreased due to affirmative action bans. The DHS results (upper panel, Table 4) show a milder impact, while the NSF results (lower panel, Table 4) suggest more potent effects of the bans. These findings are presented as unweighted, emphasizing the typical school, and weighted by enrollment, highlighting the typical student. The DHS unweighted results show that the share of URM STEM college completions at the typical school of any level of selectivity significantly decreases for all public schools (− 0.0122) and all private schools (− 0.0059). Similarly, the weighted results in terms of the typical student also illustrate significant decreases in the share of URM STEM for all public and all private schools (− 0.0053 and − 0.0083, respectively). The corresponding values for the NSF taxonomy are − 0.0200 (unweighted public), 0.0110 (unweighted private), − 0.0203 (weighted public), and 0.0361 (weighted private). While both taxonomies confirm a URM STEM decline for public schools (a more considerable decline for the NSF taxonomy), the primary distinction between the two taxonomies is that the DHS taxonomy shows a decrease in URM STEM attainment at private schools while the NSF taxonomy highlights an increase.

Table 4 Main results: the impacts of affirmative action bans on URM college completion by STEM and non-STEM

To better understand the magnitude of the bans, models were produced that represents the average share of URM STEM college completions in states with an affirmative action ban in the years before the bans went into effect. These models were estimated for public schools (0.1343) and private schools (0.1206), the DHS taxonomy in all states, and any level of selectivity and were then differenced by the estimate produced from the models. Using the DHS estimates (top panel of Table 5), this strategy ultimately created an estimated percentage difference for the magnitude of the bans’ impact on URM STEM degrees, showing − 4% and − 7% (rounded) for all public schools and all private schools, respectively. The exact process was repeated for non-STEM majors, showing URM non-STEM college completions declined by − 3% for public schools and − 9% for private schools (rounded).

Table 5 Understanding the magnitude of the bans

Assessing the STEM penalty, the DHS taxonomy produces a tiny difference between STEM and non-STEM students at public schools (− 4% to − 3%), while the NSF taxonomy produces a much larger reduction in STEM relative to non-STEM (− 13% to − 5%). The DHS taxonomy shows that URM STEM college completions for private schools decreased by 7% (rounded) and 9% (rounded) for URM non-STEM. However, for private schools, the NSF taxonomy estimates indicate URM STEM college completions increased by 7% (rounded) and very modestly declined by less than 1% for URM non-STEM. This pattern of increased URM STEM completions at private schools (in Table 5) is also shown in Table 4. When selectivity level is not taken into consideration for the DHS estimates (see top row, Table 4), the declines for STEM degrees in private schools (− 0.0059 unweighted, and − 0.0083 weighted) were lower in magnitude than the declines for non-STEM (− 0.0158 unweighted, and − 0.0146 weighted). The same NSF estimates (0.011 unweighted and 0.0361 weighted) and (− 0.0024 unweighted and − 0.0146 weighted) show an increase in URM STEM completions at private schools. In sum, the two taxonomies produce vastly different results for STEM at private schools.

The Importance of Selectivity

When examining the DHS results by level of selectivity (see Table 4), the highly selective schools are impacted the most by the bans. For STEM completions, there is a more substantial impact for public schools (− 0.0162 unweighted and − 0.0106 weighted) than for private schools (− 0.0102 unweighted and − 0.0097 weighted). However, for non-STEM completions, the results are inconclusive because the estimates for public schools are more robust than those of private schools when considering the unweighted results emphasizing the typical school (− 0.0137 to − 0.0124), but not for the weighted results highlighting the typical student (− 0.0106 to − 0.0109). At less selective schools, the bans are mostly non-significant for STEM, apart from the weighted result for private schools (− 0.0082). However, the bans are all statistically significant at less selective schools for non-STEM completions for public (− 0.0067 and − 0.0106) and private (− 0.0147 and − 0.0153) schools. When examining STEM completions at unclassified schools (i.e., schools not indexed by Barron’s), the results for public schools (− 0.0111 and − 0.0149) are significant, while the results for private schools are not. Moreover, for non-STEM completions, the effects are negative for the public and private unweighted results (− 0.0066 and − 0.0015, respectively) but positive for the weighted results (0.1164 and 0.2338). This comparison between public and private schools suggests that private schools absorb large numbers of URM non-STEM students in the wake of the bans (although the typical school did not), while public schools lost a share of URM students.

The results for STEM completions are remarkably different between the NSF and DHS estimates. Unlike the DHS estimates, analyzing the NSF results by their selectivity levels helps explain why there are declines at every level of selectivity for STEM completions at public schools: from highest selectivity schools (− 0.0209 and − 0.0218) and less selective schools (− 0.0186 and − 0.0264) to unclassified schools (− 0.0146 and − 0.0164). When analyzing the tiers of selectivity for STEM completions at private schools, it is specifically the weighted result at less selective schools (0.0579) and the unweighted result at unclassified schools (0.0192) that are driving the overall increase in URM STEM college completions at private schools (0.0110 unweighted, and 0.0361 unweighted) for the NSF. On the contrary, the DHS results indicate a decline in STEM at private schools (− 0.0059 unweighted and − 0.0083 weighted).

Concerning non-STEM completions at unclassified schools, the estimates are reasonably similar for STEM, except the positive coefficients for both the weighted public (0.1164) and private (0.2338) DHS results and the negative coefficients for the weighted public (− 0.0092) and private (− 0.0056) NSF results. Using the DHS taxonomy suggests that some large, unclassified private schools absorb a sizeable number of URM non-STEM students, while these schools did not absorb the same students using the NSF taxonomy. This difference further supports why STEM taxonomies’ role is critical to understanding the impact of affirmative action bans on URM STEM college completions. Notably, it highlights that the different STEM offerings at less selective and unclassified schools (as evidenced by DHS and NSF taxonomies) drive such stark differences in the role of affirmative action in URM STEM completions.

Understating the Long-Term Impacts of Affirmative Action Bans

Previous studies show that the impacts of affirmative action bans are most effective early on and wane over time for medical schools (Mickey-Pabello & Garces, 2018), while the effects persist at highly selective public schools (Long & Bateman, 2020). Table 6 displays the long-term impacts of the bans. Concerning all schools of any selectivity, this study finds that as time passes, URM STEM completions decline more starkly at public schools, both in terms of the typical school (− 0.0038, − 0.0066, − 0.0065, − 0.0072, − 0.0079, − 0.0102. and − 0.0122) and the typical student (0.0033, − 0.0048, − 0.0046, − 0.0049, − 0.0054, − 0.0100, and − 0.0053). The pattern is different at private schools, where URM STEM completions initially increase (0.0140, 0.0086, 0.0071, 0.0048, and 0.0036) but decline over time (− 0.0057 and − 0.0059). This pattern is actually for both the typical school and the typical student. Non-STEM URM completions follow similar practices for both private and public schools. The impacts for the typical school are slightly negative (− 0.0052 and − 0.0015) for the 2-year post-bans period, but their magnitude becomes increasingly hostile as time passes (− 0.0101, and − 0.0158, for the whole analytic period). In terms of the typical student, at first, there are gains in URM attainment for public schools (0.0024) but not for private schools (− 0.0123), but this later declines at public schools (− 0.0053) and ultimately becomes more negative at private schools (− 0.0146).

Table 6 The growing decline in URM-STEM and non-STEM completions over time

The findings are slightly different concerning highly selective schools (see bottom panel, Table 6). STEM college completions at highly selective public schools have a similar magnitude and direction as all schools regardless of selectivity (concerning both the typical school and the typical student). However, no conclusive evidence exists of increased URM STEM college completions at private schools. Non-STEM completions at public schools also follow a similar pattern. The decline grows more robust as time elapses (for both the typical school and typical student). The non-STEM results for highly selective private schools also follow the same pattern as any private school when selectivity was not considered: a growing decline in URM non-STEM completions as time passes (Figs. 1, 2).

Fig. 1
figure 1

The two tales of affirmative action bans: DHS vs NSF

Fig. 2
figure 2

The growing decline in URM-STEM and non-STEM completions over time

In all, these results strongly refute empirical claims that institutions of higher education have been able to mitigate the effects of affirmative action bans through other initiatives, such as the targeted recruitment of particular geographic regions or placing a greater emphasis on socioeconomic affirmative action as a proxy to racially-based affirmative action (Alon, 2015; Coleman et al., 2019; Long & Bateman, 2020; Reardon et al., 2017). The downward trends (in Fig. 2 below) suggest that the impact of the bans on URM college completions, for both STEM and non-STEM, has generally increased in magnitude over time.

Investigating Differences in the Impact of Affirmative Action Ban on STEM Attainment Attributable to Selectivity Measurement, Sample Specification, and Analytic Period

There are distinct ways in which previous studies differ from this one methodologically in ways that could explain the discrepancies in findings. There are fewer units of analysis in other studies; some emphasize Texas, particularly California, and selectivity is operationalized differently. A variety of sensitivity analyses using the difference-in-differences strategy investigates how different or congruent the findings in this paper are relative to other work on STEM and affirmative action (see Table 7). Hinrich’s (2012) study places more weight on results in California and Texas because they contribute more “school years” (i.e., observations) relative to other states with bans. After all, his analytic window spans from 1995 to 2007, and Texas and California have more schools than the other states. Backes’ (2012) study is similar in emphasizing Texas and California because it spans from 1990 to 2009.

Table 7 Difference-in-differences: sensitivity analyses

Selectivity Measurement

Backes (2012) uses SAT quantiles to measure selectivity. He uses four groups: the first, second, and third deciles of the highest SAT scores and a fourth for the lower seven deciles (including schools that do not report SAT scores). That approach yielded only 526 public and 1029 public schools (roughly two-thirds of the 2344 used in this study). The sensitivity analysis for this work features six panels. The first uses selectivity by Barron’s for TX and CA only excludes all banned states, except California and Texas (top panel, Table 7). Next, the analytic period is reduced to 1991–2009 to replicate the time periods used by previous studies (panel for 1991–2009, TX and CA only). The next panel below (1991–2009 for all states with bans) replicates the analysis to show differences in the research dependent on the inclusion and exclusion of ban states. The final three panels use SAT deciles to measure selectivity like the previous studies (see Table 1) instead of Barron’s Admissions Competitiveness Index.

Sample Specification

There is a notable difference in findings when using Barron’s Admissions Competitiveness Index to examine Texas and California compared to all states. Comparing selectivity by Barron’s for TX and CA only (see top panel, Table 7) to the selectivity for all states using the DHS taxonomy (see top panel, Table 4) reveals that California and Texas experience less of an impact from affirmative action bans compared to all states when concerning both STEM and non-STEM URM college completions, at both public and private schools.

Analytic Period

Most intriguing are the differences in the analytic period. When the analysis is restricted to the period from 1991 to 2009 for California and Texas (see second panel, Table 7), there are increases in URM STEM completions at public schools (statistically insignificant) and private schools (statistically significant). The public schools finding becomes statistically significant in the analysis from 1991 to 2009 when including all the ban states (see 3rd panel).

Changing the analytic period is also crucial for Texas and California. Highly selective public schools show declines (− 0.0046 unweighted) in URM STEM completions at private schools only (see top panel, Table 7). However, much more substantial increases are evident at private schools of each selectivity tier (0.0101 any selectivity unweighted; 0.0094 highly selective unweighted; 0.0120 less selective unweighted; 0.0167 non-classified unweighted) in terms of the typical school (see second panel, Table 7). These estimates in the second panel closely resemble the NSF private school estimates from Table 4, suggesting an increase in STEM college completions at private schools in the wake of affirmative action bans.

The findings from this analysis of all states are also different when comparing the truncated period 1991–2009. By contrasting the main conclusions of the analysis using 1991–2009 for all states (third panel, Table 7) to the main findings using Barron’s index (first panel, Table 3), this study finds a negative impact of bans on URM STEM completions at both public and private schools (Table 3), and a positive effect of bans on URM STEM completions at public and private schools of any selectivity (Table 7). While the findings in this paper, as well as those of Hill (2017) and Bleemer (2020), generally do not support Arcidiacono et al.’s (2016) hypothesis that affirmative action bans would increase URM STEM completions, these latter results (from the second and third panels in Table 7) suggest there might be some credibility to Arcidiacono et al.’s argument. However, there is tension, too, because Arcidiacono et al.’s hypothesis is only supported by the specifications that use Barron’s classification, not SAT deciles as measures of selectivity. Furthermore, given the different estimates for the impact of affirmative action bans on URM STEM and non-STEM completions using both the DHS and NSF taxonomies, it is possible that Arcidiacono et al.’s (2016) findings (which were based on a self-constructed measure of STEM) may more consistently estimate an increase of URM STEM college completions only in the immediate aftermath of the bans. However, without a CIP-code crosswalk, Arcidiacono et al.’s (2016) findings cannot be investigated using this more extensive data set.

Discussion and Conclusion

This study broadly supports the existence of a URM STEM decline in college completion due to affirmative action bans (Bleemer, 2020; Hill, 2017) and mostly rejects Arcidiacono et al.’s hypothesis that affirmative action bans would increase URM STEM degrees. The study suggests that only a specifically tailored analysis (where what counts as STEM/non-STEM is arbitrarily chosen and inconsistent with existing STEM taxonomies) could produce results that would support Arcidiacono et al.’s hypothesis. Furthermore, this study confirms Hill’s conclusion that affirmative action bans impact the proportion of STEM and non-STEM graduates who are URMs. However, this study differs from Hill (2017) by finding evidence of a STEM penalty relative to non-STEM. By discovering that the magnitude of these impacts is influenced mainly by choice of STEM taxonomy used, this study raises questions about the congruence of STEM taxonomies. The DHS taxonomy produced a mild STEM penalty (3.9% in STEM to 3.2% in non-STEM), while the NSF estimate generated a larger STEM penalty estimate (13.1% in STEM to 5.2% in non-STEM).

Highlighting the differences between these measures, this study concludes that the STEM penalty for affirmative action is minimal and perhaps non-existent using the DHS taxonomy but suggests it may be larger using the NSF taxonomy. Because each measure is constructed using different majors, this indicates that particular STEM majors may be driving the results. This conclusion is congruent with other research documenting variation in URM course grades and attrition across STEM disciplines (Whitcomb & Singh, 2021). Thus, Arcidiacono et al.’s (2016) hypothesis may primarily be driven by the majors chosen for their self-constructed STEM taxonomy. As such, the most prudent approach STEM scholars could take to avoid taxonomic confusion is to focus on individual majors (specifically the six-digit CIP series). Scholars should fully describe and be transparent about the limitations of the STEM taxonomy they use, whether it be self-generated or consistent with a taxonomy created by a government entity.

This study also presents the first empirical evidence of the impact of affirmative action bans on URM STEM college completions at private schools. However, the narrative findings for those results are complicated the most by the STEM taxonomies. The DHS taxonomy produced estimates of a 6.8% decline in URM STEM college completions, while the NSF estimated an increase of 6.8% in URM STEM college completions. The NSF taxonomy suggests that some URM STEM students may have chosen to attend and ultimately graduate from private schools, while the DHS taxonomy indicates that private schools may have experienced a spill-over effect and became de facto targets of the bans.

Furthermore, this study shows the importance of model specification and measurement via taxonomies and school selectivity. It approximates the specifications found in previous STEM and affirmative action studies as closely as possible by changing the years used in the analysis and restricting the group of ban states to Texas and California. Specifications limiting the inclusion of states and the analytic period lend some empirical support to Arcidiacono et al.’s (2016) hypothesis that affirmative action bans could have increased URM STEM attainment. However, these results were short-lived since the long-term impacts of the bans under both STEM taxonomies estimate a STEM decline over time. Thus, this study offers additional and robust support for the findings of Hill (2017) and Bleemer (2020) that affirmative action bans decrease URM STEM college completions. Furthermore, Arcidiacono et al.’s (2016) hypothesis was only supported for the model specification using Barron’s Competitiveness Index as the measure of selectivity, but not SAT quantiles.

Additionally, the impact of the bans has not waned over time for undergraduate education as it has for medical schools (Mickey-Pabello & Garces, 2018). This study shows that the impacts on undergraduate education have not simply persisted (Long & Bateman, 2020) but have grown stronger over time (see Table 6), suggesting it is likely one of the causes of a rapid decline in STEM attainment by Blacks students since 2000 (Smart, 2020). One might assume that the extreme selectivity of medical schools causes the two different outcomes. However, this pattern remains when even the most highly selective undergraduate institutions are considered. Speculatively, the mechanism underlying the discrepancies may not be due to selectivity; rather, it may be due to politically external and organizationally internal forces facilitating or blocking ethnoracial diversity efforts of colleges or universities that operate under affirmative action bans. Universities have organizational-level mission statements and core beliefs to follow and are also pressured by their professional organizations and accreditation bodies. An institution may have separate accreditation bodies for medical schools (e.g., Liaison Committee for Medical Education and the Commission on Osteopathic College Accreditation) and undergraduate education (e.g., New England Commission of Higher Education). For example, Wayne State University’s medical school in Michigan, a state with an affirmative action ban, was cited for an accreditation violation in 2015 because it failed to admit enough URMs. This emphasizes that schools have numerous policy standards to adhere to beyond those imposed by affirmative action bans. For medical schools, it appears that pressures to racially diversify medical education may have resulted in increased efforts to promote diversity despite affirmative action bans, while at 4-year bachelor’s degree-granting institutions, they have not.

Further work should seek to find empirical support for this speculation. However, Long and Bateman (2020) also argue that alternative strategies to achieve diversity in a race-neutral context make this extremely difficult to study empirically. They contended that many postsecondary admissions offices are not transparent about their recruitment strategies and do not subject their strategies to randomized controlled testing, which would give insight into their effectiveness.

The results of this study underscore that the various STEM measures can lead to different narratives around the role of STEM in education and occupations (Stevenson, 2014). The DHS measure is more inclusive of social science as a STEM because it includes subfields within education, psychology, and economics. The DHS measure also has more precision about what does and does not count as STEM because it uses the six-digit CIP code series from the National Center for Education Statistics (as opposed to the four-digit series used by the NSF). Together, these differences provide a different narrative around the role of affirmative action bans on STEM degree attainment. Specifically, they have the potential to offer misleading conclusions about other STEM-related studies or how federal dollars should be allocated. Considering there are fourteen federal government agencies providing support for STEM education, consistency around STEM would be ideal, and more transparency across how these various entities define and operationalize STEM would be a consolation.Footnote 5

There may also be too many STEM measures, which can create an inconsistent representation of STEM in society. The U.S. Department of Commerce has a STEM taxonomy for the workplace. The Department of Homeland Security has another STEM taxonomy (used in this study) directly related to a student’s eligibility for an F-1 visa through the STEM Optional Practical Training (OPT) extension. The National Science Foundation has separate taxonomies for STEM majors (also used in this study) and STEM occupations. Furthermore, other scholars create their own STEM and major academic classifications without Classification of Instructional Program crosswalks, which further complicates the ability to compare across studies or replicate other work using a different data set (Arcidiacono et al., 2016; Riegle-Crumb et al., 2019). Others chose physical sciences, engineering, math, and computer sciences (PEMC) as an alternative yet do not identify which CIP codes distinguish them from other STEM taxonomies and measures (Dika & D'Amico, 2016; Dika et al., 2022; Schneider et al., 2015).

STEM definitions have different policy manifestations. The Secretary of Homeland Security, Alejandro Mayorkas, states that “STEM innovation allows us to solve the complex challenges we face today and make a difference in how we secure and protect our country. Through STEM education and training opportunities, DHS is expanding the number and diversity of students who excel in STEM education and contribute to the U.S. economy.” (U.S. Department of Homeland Security, 2022). Thus, the primary function of STEM classification for DHS is related to immigration policy.

On the other hand, the NSF supports research across all fields of science and technology, primarily through research grants. Thus, changes in STEM definitions have funding implications tied to them. In 2016 alone, the NSF spent 1.187 billion dollars on STEM education (The Office of Science and Technology Policy, 2016). The bottom line is that there is a lack of consensus about what exactly STEM is, yet those classifications play an influential role in STEM in political, academic, and public spheres (Granovskiy, 2018).