Sterile Neutrinos and the Global Reactor Antineutrino Dataset

We present results from global fits to the available reactor antineutrino dataset, as of Fall 2019, to determine the global preference for a fourth, sterile neutrino. We have separately considered experiments that measure the integrated inverse-beta decay (IBD) rate from those that measure the energy spectrum of IBD events at one or more locations. The software used is the newly developed GLoBESfit tool set which is based on the publicly available GLoBES framework and will be released as open-source software.

Nuclear reactors have been workhorses for neutrino physics since its inception as an experimental science by Cowan and Reines [1]. A number of important measurements of fundamental neutrino 1 properties have been made using reactor neutrinos: we highlight the confirmation of solar neutrino flavor conversion by KamLAND [2] and the measurement of θ 13 [3][4][5]. Neutrinos are not produced in the fission process itself but in the subsequent beta decays of neutron-rich fission fragments. The first reactor neutrino experiments already had to deal with the question of how to predict the neutrino flux from these complex sources [6]; interestingly, the uncertainty quoted was 5-10%. The neutrino flux question has taken on a central role with reactor antineutrino anomaly (RAA) [7]: an initial reactor flux re-evaluation [8], later confirmed by one of the present authors [9], lead to an increase in predicted reactor neutrino event rates and thus to an observed deficit of reactor neutrinos.
An exciting, albeit speculative, explanation would be provided by oscillations into a sterile neutrino with ∆m 2 1 eV 2 . Unsurprisingly, this hypothesis has triggered significant interest [10], resulting in a number of new reactor neutrino experiments aimed at finding an actual oscillation signature. At the same time, this interest also resulted in a significant effort, both theoretically and experimentally, to improve reactor neutrino flux predictions; see, for instance, Ref. [11] and references therein.
The interpretation of reactor neutrino data in terms of sterile neutrino oscillations is relatively straightforward and free of major tension, even when combined with other available electron neutrino disappearance data; see, e.g., Ref. [12,13]. However, when an attempt is made to explain also the anomalous results in theν µ →ν e and ν µ → ν e channels as observed by LSND [14] and MiniBooNE [15] significant tension arises: for a sterile neutrino to be able to mediate the transition between two different active flavors, the sterile neutrino must mix with both of these flavors. As a result, there is a welldefined relationship between the amplitude of ν µ → ν e oscillations and the expected effect in the disappearance channels ν µ → ν µ and ν e → ν e . The size of the RAA is consistent with sterile neutrino interpretation of LSND and MiniBooNE, but no evidence is found for commensurate disappearance in the ν µ → ν µ channel. This issue has persisted for some time and there is debate about the statistical significance of this tension, for a recent review see Ref. [16]. Nonetheless, the lack of ν µ disappearance presents a considerable deficiency in the sterile neutrino hypothesis. Here, we remain agnostic with respect to LSND and MiniBooNE and focus entirely on the electron neutrino sector.
Given the significant ongoing experimental effort in reactor neutrino experiments, combined with the potential for a major discovery of new physics, we will discuss in detail the existing reactor antineutrino data. We will compare the data with the Huber-Mueller flux model, as well as with two state-of-the-art flux predictions, one based on the summation approach [17] and one based on the conversion approach [18]. We will critically examine how these differing flux models affect the evidence for the sterile neutrino hypothesis. Moreover, we present an open-source software framework called GLoBESfit based on the established GLoBES [19,20] software suite, which will allow users to fit their preferred model to the data. We also hope that this open framework can be adopted by the experimental community to share their results.
A major result of this work is that we find if we restrict the analysis to data taken after 2010 -i.e., the modern θ 13 reactor experiments and their short-baseline counterparts -then the reactor rate anomaly seems to be enhanced. This restriction should not be misconstrued as a lack of trust in the older experiments, but it does reflect the fact that data curation and release practices were less refined in the past. As a result, much less detailed information is available for properly including older experiments into a global fit.
Best practices in data sharing are crucial for this field to make progress, since we have previously shown [21] that no single, existing reactor experiment can definitively test the currently preferred parameter space. Therefore, global fits will play an important role in neutrino physics for the foreseeable future, reinforcing the value of an open fitting framework.
This manuscript is arranged as follows. In Sec. II, we discuss the predictions of the fluxes of antineutrinos at nuclear reactors that we use, including how systematic uncertainties on these predictions are incorporated in our analyses. In Sec. III, we discuss the total IBD rate experiments that we have included in our fits, and in Sec. IV, we present our combined analyses of various subsets of these data. Similarly, in Sec. V, we discuss the IBD spectrum experiments that we have included in our analyses, and in Sec. VI we present combined analyses of these experiments. We address the effect of the infamous 5 MeV bump [22][23][24] on sterile neutrino searches in Sec. VII, and offer concluding remarks in Sec. IX.
In addition to communicating our findings, one of the intentions of this manuscript is to provide documentation for the tools that we have developed. Pursuant to this, we have included several appendices that deal with technical aspects of GLoBESfit. In Appendix A, we provide an account of how oscillations involving a sterile neutrino have been included in GLoBES, and in Appendix B, we provide a general overview of the files that constitute GLoBESfit, including their content and some aspects of their functionality. Lastly, we provide extensive data tables and supplemental figures in Appendix C; these data are important for our analyses, but would have been intrusive to have included in the main text.

II. REACTOR ANTINEUTRINO FLUX MODELS
Predictions of the spectra of antineutrinos from reactors have played a central role from the inception of neutrino physics as experimental science [25]. More recently, the RAA [7] has put a spotlight on flux models; for a recent review on flux models and how they are produced, see Ref. [11]. For the discussion here, we note the following the salient features. Neutrinos are produced not by the fission process itself, but by the beta decays of about 800 fission fragment isotopes, corresponding to about 10,000 beta decay branches. A large fraction of these beta decays are of (unique and non-unique) forbidden type, which implies a significant dependence on details of nuclear structure for both the emitted electron and antineutrino spectra. We will use the following three flux models: 1. Huber-Mueller (HM): This flux model [8,9] has become the de facto standard in the field and is based on the conversion of integrated beta spectrum measurements performed in the 1980s at the Institut Laue-Langevin [26][27][28]. This flux model has relatively little dependence on nuclear databases and employs the allowed approximation, i.e., all beta decays are treated as allowed decays.
2. Ab initio: This flux model [17] uses fission fragment yields and information on individual beta decays from nuclear databases to directly compute the neutrino flux. It also employs the allowed approximation.
3. Hayen-Kostensalo-Severijns-Suhonen (HKSS): This flux model [18] shares many of the features of HM, that is, it is based on a conversion of the integrated beta spectra. Importantly, it is the first attempt to include effects from forbidden decays in a systematic fashion.
The ratios for the ab initio and HKSS flux predictions relative to the HM predictions are shown in Figs. 1a-1d for, respectively, 235 U, 238 U, 239 Pu and 241 Pu. In each panel, the orange line at 1.0 represents the HM predictions; the blue curve represents the ab initio predictions; and the dark cyan curves represent the HKSS predictions. The associated bands represent the (1σ) systematic uncertainties on these predictions; we discuss how these are calculated below. For comparison, we also show the measured neutrino spectrum from Daya Bay [29] (red) and PROSPECT [30] (pink). Daya Bay can separate the contributions from 235 U and 239 Pu by comparing data taken at different points in the fuel cycle. Meanwhile, PROSPECT is deployed at HFIR, which is fueled with highly-enriched uranium meaning nearly all fissions are of 235 U. Visual inspection indicates that all three flux models and the neutrino data agree for 239 Pu within the large error bars. However, the ab initio flux model predicts a lower flux for 235 U than either HM or HKSS, which seems to agree somewhat better with data. Comparison of the data with any of three flux models reveals a bump in the region around 5-6 MeV. In later sections, we will perform a more careful analysis; impressions from this intial inspection will essentially be confirmed.
In all these calculations, one should formally be accounting for nonequilibrium corrections to the antineutrino flux; these are approximated by Mueller, et al., in Ref. [8] for 235 U, 239 Pu and 241 Pu. However, the collaborations often do not publish information about their burn-up, so it is unclear how large the nonequilibrium correction should be. Even taking a nonequilibrium correction corresponding to 450 days of irradiation -the largest such correction presented in Ref. [8] -the difference between this the absence of nonequilibrium correction is typically overwhelmed by both experimental and other theoretical uncertainties. Therefore, we do not expect these to meaningfully impact the final results; we therefore elect to exclude them from the present analyses.
The experimental quantity of interest in all cases is the convolution of the neutrino flux with the inverse beta decay (IBD) cross section σ IBD and the electron-type survival probability P ee (see Eq. (A.7) in Appendix A). We define the flux weighted cross sections σ i X as follows: This quantity provides the IBD rate for antineutrinos with true energy in the range [E min , E max ] calculated using flux model X = HM, AI, HKSS for isotope i = 235, 238, 239, 241. The total IBD rate in this energy window is then the sum of the σ i X , weighted by the appropriate effective fuel fraction f i , i.e., the fraction of fissions due to each isotope during the operation of the experiment. In this work, we use the IBD cross section published in Ref. [31] -particularly, the results up to and including O(1/M ) as given in their Eqs. (14) and (15). The uncertainty on this cross section is negligible compared to the uncertainties on the fluxes.
We consider two types of reactor antineutrino measurements in this work. In Secs. III and IV, we consider experiments that measure the total IBD rate, but not necessarily the specific energy of a given event. For these, we will take [E min , E max ] = [1.8, 8.0] MeV. On the other hand, in Secs. V and VI, we consider measurements of the IBD spectrum over various energy bins, each with its own [E min , E max ]. Our treatment of the flux systematics at total IBD rate experiments is common to all such experiments; the flux systematics at spectrum experiments must be handled on a case-by-case basis.
We use the following generic procedure to incorporate systematic uncertainties stemming from flux predictions into our analyses.
1. We use information presented in the literature to assign a covariance matrix to a set of flux predictions.
2. We then generate random variations of these fluxes using this covariance matrix to numerically calculate σ i X in Eq. (II.1), assuming P ee (E ν ) = 1, for some specified [E min , E max ].
3. For total rate analyses, we take [E min , E max ] = [1.8, 8.0] MeV and calculate the σ i X and their correlations. These results will be applied to all total IBD rate measurements simultaneously, as will be described in Sec. IV. 4. For spectral analyses, we calculate fractional deviations in the individual bins, including correlations. Usually, we will be interested in the ratio of two such spectra. The resulting covariance matrices are then combined with other experimental systematics; see Secs. V and VI for more details.
When we numerically integrate over the predicted fluxes, we do so by logarithmically interpolating/extrapolating on the published flux models.  [8,9] and are, of course, equal to one everywhere. The blue curves are for the ab initio fluxes presented in Ref. [17]. The dark cyan curves are for the updated conversion method calculation presented in Ref. [18], which we call HKSS. The shaded regions represent the corresponding uncertainties, being equal to the square root of the diagonal elements of the global covariance matrix. Note that the ab initio fluxes have no corresponding uncertainties; see text for details. The data points are based on antineutrino measurements performed at reactors, Daya Bay [29] (red) and PROSPECT [30] (pink).

HM Fluxes
For the HM fluxes we use the central values published in Tables VII-IX of Ref. [9] for the fluxes from 235 U, 239 Pu and 241 Pu, respectively, whereas we use the values from the "N ν , 12 h" column of Table III of Ref. [8] for 238 U. Regarding errors, we use the following prescription. For 235 U, 239 Pu and 241 Pu, we take the (percent) errors listed in the columns headed "stat." and "bias err." from Tables VII-IX of Ref. [9], respectively, to be fully uncor-related between bins and these three isotopes; and the columns headed "Z," "WM" and "norm." are added in quadrature and taken to be fully correlated between bins and these three isotopes. We assume all errors to be Gaussian; asymmetric errors are replaced by a Gaussian with width given by the geometric mean of the upper and lower uncertainties. For 238 U, we combine the errors in the "Nuclear databases" column of Table II in Ref. [8]the correlations of which are shown in Table I of the same reference -along with the "Missing info." column of the same -which we take to be fully correlated. These uncertainties yield the orange bands in Figs. 1a-1d.

Ab Initio Fluxes
The values of the antineutrino fluxes can be found in the Supplementary Material to Ref. [17]. 2 These predictions, however, are not published with systematic uncertainties; consequently, we show no blue shaded regions in Figs. 1a-1d.
In our analyses, we will always apply the systematic errors associated to the HM fluxes to the ab initio fluxes, as well. This is likely an optimistic assignment; though precise uncertainties have not been calculated, the true systematic uncertainty on these predictions is likely of order 5%. However, we will argue below that this treatment is sufficient for the conclusions of this work. For now, we simply point out that this treatment results in a more aggressive assignment of statistical significances than a more realistic error budget would produce. 2 We thank Muriel Fallot for providing us with these fluxes in machine readable format [32].

HKSS Fluxes
The fluxes are presented in Tables VI-IX of Ref. [18]; specifically, we use the results in the columns headed "δN Num.," which represent the (percent) rescaling introduced by including first-forbidden decays to the reactor antineutrino flux. Systematics are handled by splicing together a subset of the errors published in Ref. [18] with those for the HM fluxes from Ref. [8,9]. Specifically, we use all the components of the HM uncertainty budget discussed above, and we add to this the uncertainties under the columns headed "g A " and "Param." in Tables VI-IX of Ref. [18]; these latter components are each taken to be totally correlated between bins and isotopes. These uncertainties yield the dark cyan bands in Figs. 1a-1d.
The procedure outlined above gives us the following isotopic IBD yields: σ 235 HKSS = (6.67 ± 0.15) × 10 −43 cm 2 /fission σ 238 HKSS = (10.08 ± 1.14) × 10 −43 cm 2 /fission σ 239 HKSS = (4.37 ± 0.12) × 10 −43 cm 2 /fission σ 241 HKSS = (6.06 ± 0.14) × 10 −43 cm 2 /fission Note that these have increased by ∼ 1% relative to the nominal HM predictions. The uncertainties have also grown, stemming from uncertainty related to g A and the parametrization procedure used in Ref. [18], but the change is quite modest -including first-forbidden contributions has not resulted in a grossly inflated error budget. These results are described by the following correlation matrix: Unlike for the HM and ab initio flux predictions, the flux from 238 U is now correlated with the three other isotopes via the axial coupling g A .

III. RATE EXPERIMENTS
In this section, we describe the experiments that have measured the IBD rate that enter into our analysis and how these are defined in GLoBES. We are ultimately interested in the ratio of the measured IBD rate relative to one of (1) the HM predictions, (2) the ab initio predictions, or (3) the HKSS predictions. These are compared against the experimentally determined ratios relative to the same flux prediction. Many of the details regarding specific implementations of these experiments are the same between them; we relegate discussion of these to Appendix B.

Bugey-4
The detector width is taken to be 3.0 m and the core is assumed to be point-like. The center-to-center distance between the core and the detector is 15 m. The fuel fractions published by the collaboration [33] are The experimental resolution is not published by the collaboration. In the absence of further input, we take the resolution to be 6%/ E[MeV], which is the stated resolution for Bugey-3. The published total IBD yield is σ = 5.752 × 10 −43 cm 2 /fission [33]; this leads us to R HM = 0.941, R AI = 0.979 and R HKSS = 0.933. Following Refs. [13,34], the experimental uncertainty is taken to be 1.4%, which is entirely correlated with Rovno 91.

Rovno 91
Rovno 91 used the same detector as Bugey-4; the baseline, however, is different -namely, 18 m. The fuel fractions published by the collaboration [35]  The resolution is not published by the collaboration. In the absence of further input, we take the resolution to be 6%/ E[MeV], as for Bugey-4. The absolute IBD cross section measured at Rovno 91 is σ = 5.85 × 10 −43 3 Refs. [13,34]  cm 2 /fission [35]. We find R HM = 0.939, R AI = 0.983 and R HKSS = 0.931. The total experimental uncertainty is 2.8%, of which 1.4% is correlated with Bugey-4.

Bugey-3
The collaboration reports IBD rate measurements at 15 m, 40 m and 95 m. The fuel fractions published by the collaboration [36] are: (f 235 , f 238 , f 239 , f 241 ) = (0.614, 0.074, 0.274, 0.038) , which we assume applies to all three measurements. The collaboration claims an energy resolution of The collaboration only publishes the ratios of their measurements with respect to Refs. [27,28]; we instead derive IBD cross sections from their stated event rates and experimental specifics and find σ = {5. 75

ILL
We mostly follow Ref. [38]; however, we also correct for the fact that Ref. [39] reports that the original power of the ILL experiment was underreported by 9.5%. The detector width is 2.0 m, the core is assumed to be pointlike and the center-to-center distance is 8.76 m. The core consists of 93% enriched 235 U; our analysis assumes that this is the only relevant isotope. The energy resolution is taken to be 18%/ E[MeV]. We determine from Refs. [38,39] that σ = 4.76×10 −43 cm 2 /fission, implying the ratios R HM = 0.824, R AI = 0.881 and R HKSS = 0.816. The experimental uncertainty is 9.1%, of which 3.8% is correlated with each of the three Gösgen measurements.

Krasnoyarsk 87
The detector is taken to have a width of 2.0 m while the core is point-like and comprised purely of 235 U. The center-to-center distances at these experiments are 32.8 m and 92.3 m. The energy resolution is not specified in Ref. [ We essentially repeat the analysis of Krasnoyarsk 87, except that the center-to-center distance is now taken to be 57.0 m. As before, the reactor is approximated to be purely 235 U and the energy resolution is 20%/ E[MeV]. The collaboration publishes the IBD yield as σ = 6.26 × 10 −43 cm 2 /fission [41]; this results in R HM = 0.945, R AI = 1.011 and R HKSS = 0.936. The uncertainty is 4.2% and is uncorrelated with any other experiment.

Krasnoyarsk 99
We again re-use the Krasnoyarsk analysis with a center-to-center distance of 34.0 m and a core of pure 235 U. The energy resolution is again taken to be 20%/ E [MeV]. The collaboration publishes the IBD yield as σ = 6.39 × 10 −43 cm 2 /fission [42]; this yields R HM = 0.964, R AI = 1.032 and R HKSS = 0.956. The corresponding uncertainty is 3.0%, which is uncorrelated with any other experiment.
Ref. [42] also includes measurements of charged-and neutral-current deuterium disintegration rates, but these are significantly less precise than the IBD rate measurement; we do not include these in our fits.

Savannah River
The detector is taken to have a width of 2.0 m and the center-to-center distances are 18.2 m and 23.8 m in the two detector configurations with a core of essentially pure 235 U. The energy resolution is taken to be 20%/ E[MeV]. The collaboration does not explicitly report their inferred IBD rate, but from the integrated rates in Table III of Ref. [43] and other experimental information, we derive σ = {6.07, 6.48} × 10 −43

Rovno 88
There are five measurements under the Rovno 88 umbrella [44]. The first two (1I and 2I) are polyethylene detectors studded with 3 He proportional counters, each located ∼18 m from the core. The latter three (1S, 2S and 3S) are liquid scintillation detectors; 1S and 3S are located 18 m from the core, while 2S is located 25 m from the core. 4 Each detector has a width of 1.0 m, which is roughly consistent with Fig. 1 of Ref. [44], and the resolution is taken to be 20%/ E [MeV].
Each measurements takes different effective fuel fractions: In Table III of Ref. [44], the collaboration has provided their inferred IBD rate for each of these five experiments. However, in Table II

Nucifer
Given the short distance between the core and detector (7.21 m), one might worry that the finite extent of both the reactor core and of the detector may be important in determining the correct oscillation probabilities. However, given that Nucifer reports a total rate measurement, and given the relatively poor energy resolution, we find that this effect is not quantitatively important here. The composition of the core is [46] (f 235 , f 238 , f 239 , f 241 ) = (0.926, 0.008, 0.061, 0.005) .
The core is dominated by 235 U, but since the effective fuel fractions for all four isotopes have been provided, we take these into account. The collaboration claims an energy resolution of 20%/ E[MeV]. From their reported event rate and other experimental specifics, we estimate the IBD rate to be σ = 6.847 × 10 −43 cm 2 /fission. This leads to R HM = 1.046, R AI = 1.115 and R HKSS = 1.036. The experimental uncertainty is 10.7% and is uncorrelated with any other experiment.

III.2. Medium-Baseline Experiments -Total Rates
We now turn our attention to medium-baseline oscillation experiments. At these distances, O(10 2 − 10 3 ) m, two-flavor oscillations are insufficient; we must at least account for nonzero θ 13 and ∆m 2 31 (or the effective mass splitting ∆m 2 ee ). In practice, we include full four-neutrino oscillations, though we keep the relevant parameters from three-neutrino oscillations fixed at their best-fit values from Ref. [47]:  Note that sin 2 θ 23 does not factor into this analysis.
We define each detector width to be 4.0 m and take the reactors to be point-like; this helps high-frequency oscillations to average out smoothly, but otherwise does not impact our analysis. Since medium-baseline experiments are sourced by multiple reactors, our GLoBES definitions must account for the relative contributions of the reactors.
As mentioned, three-neutrino effects are important at these baselines; this can cause some ambiguity in the definition of the ratio R. In this work, we always assume that R is defined with respect to the absence of oscillations. We contrast this with the expected deficit within the three-neutrino framework relative to the absence of oscillations, which we will call R 3ν .

Palo Verde
Palo Verde [48] consists of three reactor cores; one is located 750 m from the detector and the other two are at 890 m. Table I of Ref. [48] presents the operating cycle over the course of the experiment. Assuming the total power of 11.63 GW th is split evenly among the three reactors over the duration of the experiment and accounting for efficiency, the total exposure from the 750 m reactor is The resolution is also not stated in Ref. [48], so we assume it to be 20%/ E[MeV]. We use information in Ref. [48], along with the total proton number in Ref. [49], 5 to determine σ = 6.036 × 10 −43 cm 2 /fission. From this, we calculate R HM = 0.971, R AI = 1.014 and R HKSS = 0.963. The uncertainty is taken to be 5.4%, which is uncorrelated with any other experiment. For context, Ref. [51] reports a threeneutrino expectation of R 3ν = 0.967.

Double Chooz
Double Chooz [52,53] consists of two reactors and two detectors; this measurement pertains to the near detector, from which the two reactors are 355 m and 469 m away. We assume that each reactor operates for the same amount of time over the course of the experiment, and that the average powers delivered during those times are the same. The effective fuel fractions are assumed to be We take the energy resolution to be 8%/ √ E [52]. Ref. [53] reports σ = 5.71 × 10 −43 cm 2 /fission. However, this result has been corrected for the effects of nonzero θ 13 . We calculate R HM = 0.934, R AI = 0.969 and R HKSS = 0.926. The corresponding uncertainty is 1.4%, which is uncorrelated with any other experiment.

Chooz
Chooz [54] consists of two reactors located 998 m and 1115 m away from the detector. The two reactors have different run times and effective powers; the operation periods are broken down in Table 1 of Ref. [54]. We assume that the reactors operate at the same average power during the period in which both are operational; this implies that reactor 1 (1115 m) delivered 12715.5 GW·h over the experiment, while reactor 2 (998 m) delivered 8556.5 GW·h. The effective fuel fractions are assumed to be the same for each reactor: We take the energy resolution to be 0.5 MeV; Ref. [54] claims a resolution of 0.33 MeV, but we use the more conservative estimate here.

Daya Bay
The details about the experimental geometry are well documented in Refs. [55,56]. We tabulate relevant experimental specifics in Tables XIII-XVII in Appendix C. The fuel evolution result that we use corresponds to the 1230-day data release; an analogous analysis for the 1958day data appeared in Ref. [29], which we look forward to including in the future.
AD8 did not operate for as long as AD1-3 in the 1230day dataset; this is characterized by the exposure for detector d, defined as where P 6AD,8AD r represents the average power of reactor r during either the 6AD or 8AD period, and t 6AD,8AD d is the operating time of the detector during these periods. The powers are presented in Table XIV. For all detectors, t 8AD AD8 has t 6AD d = 0 days. Moreover, we account for the different total AD target masses and efficiencies, given in Table XV.
The Daya Bay data are presented in bins of differing fuel fractions. We present the average fuel fractions of these bins and the corresponding IBD rate measurements in Table XVI. Our calculations of the ratios R HM , R AI and R HKSS are also shown in Table XVI. Our analyses employ the covariance matrices provided by the collaboration in the Supplementary Material to Ref. [57], appropriately converted from absolute IBD rate to ratio with respect to our IBD predictions.

RENO
In Table XVIII, we show the distances between each RENO reactor and the near and far detectors; in Table  XIX, we show the average power of each reactor during the experiment; 6 and in Table XX, we show other important experimental information, including the detector operating time and efficiency. These data will also be used in the RENO spectral analysis; here, we only need the information for the near detector.
We take the flux-weighted average baseline to be 433.1 m. Out definition of "average baseline" is different from that of RENO [59]; we weight by the reactor flux at the 6 We thank Soo-Bong Kim for providing this information [58]. detector, but RENO considers the average distance that a neutrino that has already interacted in the detector would have traveled. This is why we don't use the 410.6 m that the collaboration publishes. We take the energy resolution to be a constant 0.4 MeV. As with Daya Bay, the RENO collaboration reports their total IBD yield for eight different sets of fuel fractions; we present these fuel fractions and the corresponding measured IBD rates, obtained by digitizing Figure  2 in Ref. [59], in Table XXI. In Table XXII, we show our estimates of the ratios using the HM, ab initio and HKSS fluxes. In our analyses, we use the covariance matrix provided by the collaboration in the Supplementary Material to Ref. [59].

IV. RATE ANALYSES
In this section, we discuss how the experiments in the previous section constrain the mass and mixing of an additional sterile neutrino. This is one possible application of GLoBESfit -the analysis techniques we describe can be modified for any one of a number of new-physics scenarios.

IV.1. Data -A Combined View
We depict the data discussed above in Figs. 17 and 18. The orange data points represent the ratio R of measured IBD rates relative to the prediction calculated using the HM flux model. The blue data points are the same using the ab initio fluxes; the dark cyan points employ the HKSS fluxes. The depicted error bands are experimental only; theoretical uncertainty is of order ∼ 2.5%.
As discussed in Sec. II, the reduction of the antineutrino flux from 235 U, in particular, in the ab initio fluxes results in a diminished predicted IBD rate, implying larger experiment-to-prediction ratios. On the other hand, the HKSS fluxes almost universally result in a higher predicted flux, resulting in a diminished ratio R.
For each group of correlated measurements, we show the 68.3% and 95% confidence level (C.L.) contours derived for each flux model with solid and dashed curves, respectively, in Figs. 2-16; the captions detail which experiments are depicted therein. Results using the HM fluxes are shown in orange; those using the ab initio fluxes are show in blue; and those using the HKSS fluxes are shown in dark cyan.
In Tables I, II and III, we tabulate relevant statistics for our analyses using the HM, ab initio and HKSS fluxes, respectively. Specifically, we show the minimum value of the χ 2 for each group of experiments, χ 2 min , as well as the χ 2 in the limit of no mixing with a sterile neutrino, χ 2 3ν . The difference between these is used to determine the pvalue for each group of experiments, assuming these are chi-squared distributed.

IV.2. Combining Experiments
To analyze all of these experiments simultaneously, we combine the uncertainties on the total rate experiments into one covariance matrix, the elements of which are given by correlated and uncorrelated uncertainties described in Sec. III. We use separate covariance matrices for Daya Bay and RENO, which have been provided by the respective collaborations. We incorporate systematic uncertainties stemming from the flux predictions via four nuisance parameters (one for each of 235 U, 238 U, 239 Pu and 241 Pu), which we bundle into a vector ξ. These flux predictions are correlated; we account for these correlations using the covariance matrices discussed in Sec. II.
The chi-squared function we employ has the following form: where R exp is the vector of experimental ratios and R pred = R pred (sin 2 2θ ee , ∆m 2 41 , ξ) is the vector of predicted ratios. Our treatment of theoretical uncertainties has been previously described in Sec. II; the theoretical covariance matrix V th accounts for these in our calculations. Furthermore, V exp is the covariance matrix describing experimental uncertainties. We minimize over ξ for each point in sin 2 2θ ee -∆m 2 41 parameter space. In Figs. 19, 20 and 21, we separately show the regions    I: Values of χ 2 for our analysis -namely, the minimum value (χ 2 min ), the value for sin 2 2θee = 0 (χ 2 3ν )and the p-value at which three-neutrino mixing can be excluded for each experimental block for our analysis of the HM fluxes. The number of data points in each block is denoted n data . We convert the p-value to the number of σ in the last column.  Tables I-III. In each case, the MBL experiments prefer a smaller value of ∆m 2 41 than the SBL experiments, as one would expect. In the region above ∆m 2 41 1 eV 2 , we see that the MBL experiments are generally consistent with the region preferred by the SBL experiments. Both SBL and MBL experiments separately indicate 2σ evidence for a sterile neutrino for the HM and HKSS flux models, though the latter is stronger than the former. Conversely, the ab initio fluxes are much more indifferent to the existence of a sterile neutrino; the 95% C.L. curves do not close for either SBL or MBL experiments, as can be seen from Fig. 20.
In Fig. 22, we show results from combining all SBL and MBL reactor experiments for all flux models we have considered. The solid and dashed curves depict the 95% and 99% C.L. regions, respectively. As before, the HM fluxes are shown in orange; the ab initio fluxes are show in blue; and the HKSS fluxes are shown in dark cyan.  Using the ab initio fluxes decreases the global preference for a sterile neutrino relative to the HM fluxes: the evidence decreased from 2.3σ to 0.65σ. The reason for this, as we have mentioned, is that the overall flux normalization for the ab initio flux prediction for 235 U is ∼ 10% less than the corresponding HM prediction. Therefore, the ab initio expected IBD rates are less than the HM rates, so the evidence for a sterile neutrino is diminished. Moreover, recall that assigning the HM uncertainties to the ab initio flux predictions is optimistic; the true uncertainties associated with the calculation are assuredly larger than the ∼ 2.5% that we have ascribed. Consequently, this flux model is consistent with no sterile neutrino oscillation.
On the other hand, using the HKSS fluxes increases the global preference for a sterile neutrino: the evidence increases from 2.3σ to 2.6σ. The effect of including firstforbidden decays is to increase the expected flux in the region around E ν ∼ 6 MeV; the conversion process thus produces a higher overall antineutrino flux, implying a larger deficit than for the HM fluxes. Additionally, the inclusion of first-forbidden decays has not dramatically increased the uncertainty budget of this calculation, as had been speculated may happen [60].
The observation of this diverging preference for a sterile neutrino is one of the central conclusions of this work. Clearly, at most one of these flux predictions is correct; determining which is paramount if the community wishes to determine how to best pursue this anomaly in the coming decade. It is critical to note that these predictions  are ultimately derived from experimental measurements -the conversion method requires reliable uranium and plutonium fission β spectra, and the ab initio method requires precisely known cumulative fission yields and β decay strengths. Perhaps the most critical step to be taken in improving the antineutrino flux predictions is to improve the underlying data on which they are based.
It is important to stress that HKSS represents the first attempt to systematically include forbidden decays in flux predictions and there are a number of caveats one might raise with respect to the methods used. The shell model works best for spherical nuclei with nearmagic numbers of nucleons, properties decidedly absent in many fission fragments. Moreover, the shape factors reported in Ref. [18] are larger than any observed shape factors. It is worthwhile to point out that for nuclei with non-unique forbidden decays for which the beta spectrum has been measured, the overall spectral shapes are close to the allowed one. A similar caveat applies to Ref. [60]: if the operator which causes by far the largest effect on the neutrino flux, [Σ, r] 1− , were to dominate a given beta decay branch, then a bimodal beta spectrum would result. However, this never has been observed. This may all be indicating that the concerns about forbidden decays in computing reactor neutrino fluxes may have been overstated. Given the scarcity of experimental information that accompanies reports of older reactor experiments, one can reasonably ask how the evidence for the existence  Table IV.
Making this cut removes most SBL experiments, so the barycenters of the fits shift to smaller values of ∆m 2 41 . Imposing this cut increases the evidence for a sterile neutrino for all three flux models. For the HM and HKSS flux models, the evidence increases to 3.0σ and 3.6σ, constituting fairly strong evidence. On the other hand, the significance for the ab initio flux model rises to 1.6σ.
To counteract the disproportionately high relevance of MBL experiments in this fit, we impose an additional ad hoc prior requiring ∆m 2 41 > 0.1 eV 2 and repeat the analysis. The resulting statistics are shown in Table V. The evidence is weakened by imposing this prior, but the same basic trend emerges: the HM and HKSS flux models prefer a sterile neutrino at 2σ, while the ab initio fluxes show negligible preference.

IV.3. Alternate Analysis: Rescaling the HM Fluxes
We consider an alternative to the sterile-neutrino hypothesis: that the data can be explained by simply rescaling the HM fluxes. In particular, we only consider the two dominant fissile isotopes -235 U and 239 Pu -and introduce a rescaling factor of each, respectively r 235 and r 239 . Similar analyses have been performed in   Refs. [29,57,59,61,62]. We scan over the r 235 -r 239 plane in order to determine the extent to which the data prefer a rescaling of the HM fluxes and to determine if this is a more compelling explanation of reactor rate deficits than introducing a sterile neutrino. As in our sterile neutrino analyses, we treat systematics from 238 U and 241 Pu using nuisance parameters. In Fig. 24, we show contours of constant ∆χ 2 in the r 235 -r 239 plane for different subsets of reactor experiments. The red regions refer to our analysis of SBL and MBL experiments that measure the total, timeintegrated rate (namely, all of the above experiments except for Daya Bay and RENO); the purple regions refer to MBL experiments that measure the IBD rate as the effective fuel fractions change, i.e., Daya Bay and RENO; and the gray regions refer to a combined analysis of all experiments. Dark (light) shading represents the 95% (99%) C.L. region.
In Table VI, we present the minimum value of the χ 2 (χ 2 min ), its value at the point r 235 = r 239 = 1 (χ 2 0 ) -indicated by the orange circle in Fig. 24 -and both the corresponding p-value and nσ at which the unscaled HM fluxes can be excluded for each dataset. We also show the 1σ (2σ) preferred region for the HM fluxes in dark (light) orange shading. While the fuel evolution dataset exhibits a mild preference (2.4σ), the total rate dataset presents 4.4σ evidence in favor of rescaling the HM flux predictions. Taken together, these data present 5.0σ evidence for this hypothesis, with the 235 U flux, again, being particularly suspect.
On the other hand, we can directly compare how the ab initio and HKSS flux models compare to this rescaling of the HM fluxes. The blue and dark cyan circles, respectively, represent the relative sizes of the 235 U and 239 Pu isotopic IBD fluxes as compared to the HM prediction, calculated using our results from Sec. II. As for the HM fluxes, the appropriately colored regions represent the 1σ (dark) and 2σ (light) regions for these flux models. These three models generally agree on the flux     from 239 Pu fissions, but disagree quite severely for 235 U, with the ab initio fluxes presenting a ∼6% deficit and the HKSS fluxes presenting a ∼1% enhancement with respect to HM. We quantify this in Table VII, where we show how these alternate flux predictions compare to freely rescaling the 235 U and 239 Pu fluxes. Unsurprisingly, we find that the ab initio fluxes are consistent with the best-fit points from these analyses; the tension between them fails to surpass 1σ for any analysis we have performed. Conversely, the data strongly disfavor the HKSS fluxes: while the fuel evolution data only exhibit 2.9σ tension, the total rate experiments present 5.3σ tension, and all experiments together present 5.9σ tension with this flux model. This again underscores the need to revisit the data that underpin these flux predictions.
An interesting subset of the rescaling hypothesis is to consider r 235 and r 239 being rescaled by the same amount. The line along which r 235 = r 239 is shown in black dashing in Fig. 24. In Fig. 25, we show the values of ∆χ 2 relative to the minimum value in the entire r 235 -r 239 plane as a function of r 235 = r 239 . Shown are the respective curves for integrated rate (red), fuel evolution (purple) and all experiments (black). The solid (dashed) gray, horizontal line is the 95% (99%) C.L. exclusion limit, calculated for two degrees of freedom; we present this as a subspace of the r 235 -r 239 plane of Fig. 24, and not necessarily as a separate hypothesis.
In Table VIII, we show the minimum value of ∆χ 2 along this one-dimensional subspace, ∆χ 2 1D , the corresponding p value and equivalent number of σ. The data are modestly consistent with a uniform rescaling of the 235 U and 239 Pu fluxes; a nonuniform rescaling of the 235 U and 239 Pu fluxes is preferred in an absolute sense, but not dramatically so. None of these analyses excludes a uniform rescaling of the HM fluxes by more than 1.4σ.
We compare the sterile neutrino hypothesis and the rescaled-HM-fluxes hypothesis. For the former, we find χ 2 min /d.of. = 35.0/38, implying p = 0.61 (or 0.51σ). For the latter, we find χ 2 min /d.o.f. = 31.5/38 and p = 0.76 (0.30σ). Clearly, the data mildly prefer for the fluxes to be independently rescaled over introducing a sterile neutrino. This is unsurprising: the the sterile neutrino hypothesis can be mapped onto rescaled-HM-fluxes hypothesis, assuming ∆m 2 41 is sufficiently large that its corresponding oscillations average out at all experiments. Because the former is a subset of the latter, it cannot be more strongly preferred. Introducing the ad hoc restriction that ∆m 2 41 5 eV 2 to ensure that oscillations would average out at all experiments, we find χ 2 min /d.o.f. = 36.7/39 and p = 0.57 (0.56σ). 7

V. SPECTRUM EXPERIMENTS
In this section, we discuss the experiments that enter into our spectral analyses. Unlike the rate experiments, these are all taken to be mutually uncorrelated. Given the increase in the complexity of each measurementi.e., given that we are considering event spectra instead of integrated event rates -the systematics become much richer than in the previous section(s). Consequently, we discuss the statistical analysis of each experiment separately in what follows.
For these analyses, we exclusively employ the ab initio flux predictions. The reason for this is that, because we are interested in the ratios of spectra, one should expect that the choice of flux model does not affect the results of such a study. One could just as easily use the HM or HKSS fluxes; the result are essentially indistinguishable.
We have limited our analyses to considering Bugey-3 [36], DANSS [63], Daya Bay [64], Double Chooz [53], NEOS [65] and RENO [59]. While the short-baseline reactor experiments PROSPECT and STEREO have produced early results, these are not yet competitive with other experiments, so we do not include them at this time. Additional data releases are expected in the near future for both experiments, 8 as well as the SoLid experiment [67]; we look forward to including all of these in future versions of GLoBESfit. 7 We take the number of degrees of freedom to be 39 here. If ∆m 2 41 is large enough for oscillations to average out, then its value cannot be measured and it is not a relevant degree of freedom. 8 While this manuscript was being prepared, Ref. [66] appeared on the preprint arXiv. We will include these data in future analyses, but we have not included them here.

Implementation in GLoBES
Based on design specifications for a 900 MW e series reactor vessel described in Ref. [68], we assume that the height of the active volume of the reactor core at Bugey-3 is 3.66 m tall with a radius of 1.5 m. We further assume that antineutrino production is uniform vertically 9 and azimuthally, but that the radial distribution is given by a zeroth-order cylindrical Bessel function whose first zero lies at the edge of the core.
We only consider the antineutrino spectra measured at the 15 m and 40 m positions. The spectral measurement at 95 m is sufficiently imprecise that we do not expect to gain appreciable sensitivity to a sterile neutrino through its inclusion; this is borne out in Fig. 16 of Ref. [36]. Ref. [36] states that the detector modules used in the experiment are comprised of 98 separate 8. We find that our results are insensitive to the exact relative orientation of either detector and the core.
We precompute the oscillation probabilities at both positions using the geometries outlined above. The quantity of interest is the flux-averaged value of sin 2 (qL) at each experiment, where is independent of the experimental geometry. Note that this definition of q assumes that lengths are given in km, which is the expectation in GLoBES. We define this average to be F (q), and discuss it in more detail in the Appendix (see Eq. (A.8) and the surrounding discussion). We precompute F (q) in intervals of 0.01 in log 10 q over the range q ∈ [10 0 , 10 4 ]. The two positions are included with two separate AEDL experiments in GLoBESfit. We fix @time, @power, @norm and $target mass to be 1.

Statistical Analysis
The upper panel of Fig. 15 of Ref. [36] shows the ratio of the spectrum measured at 40 m relative to that measured at 15 m. We have digitized this figure and the depicted statistical uncertainties for use in our analyses; we reproduce these data in Fig. 38a in Appendix C. In addition to these statistical uncertainties, the collaboration also claims a 2.0% systematic uncertainty on the relative normalization of the two spectra. We have also accounted for a 2.0% energy scale uncertainty for each position and have assumed that these are totally uncorrelated. These are included in our analyses via the GLoBES function glbShiftEnergyScale; see the GLoBES manual for a description of this function [19,20].
We utilize a chi-squared function of the following form: where S exp is the experimental spectrum and S pred is the ratio predicted using GLoBES, assuming the existence of a sterile neutrino with mixing angle sin 2 2θ ee and masssquared splitting ∆m 2 41 . The covariance matrix V B includes statistical uncertainties and the 2.0% normalization uncertainty. The nuisance parameters ξ 15 and ξ 40 correspond to variations in the energy scale at 15 m and 40 m, respectively, with uncertainties of 2.0% apiece. In our fits, we minimize over these nuisance parameters for each value of sin 2 2θ ee and ∆m 2 41 . The results of our benchmark sterile neutrino analysis are shown in Fig. 26. The dark green, green and dark green contours represent the 95%, 99% and 99.9% C.L. contours, respectively. Relevant statistics are summarized in Table IX. Bugey-3 shows a modest preference for a sterile neutrino; we find that the evidence rises to the level of 1.4σ.

Implementation in GLoBES
The DANSS [63] reactor core is a 3.7-meter-tall cylinder with a diameter of 3.2 m. We assume production to be uniformly distributed, but that the radial distribution is given by the zeroth-order cylindrical Bessel function with zero at edge of core; we sample the vertical extent of the core following the burning profile in Ref. [69]. The detector is a 1 m 3 cube located directly underneath the axis of the core; in our analyses, the centers of the core and the detector are taken to be either 10.7 m or 12.7 m apart, and we are ultimately interested in the ratio of the spectra obtained at these distances. We calculate the oscillation probabilities as described in Appendix A. We calculate These are the fuel fractions during the middle of a VVER-1000 reactor operating cycle as presented in Table 1 of Ref. [45].
Statistical Analysis Table 2 in Ref. [63] tabulates the bottom/top spectral ratio, as well as the statistical errors on these ratios; we reproduce these data in Fig. 38b in Appendix C. The DANSS collaboration has not published a thorough analysis of their systematic uncertainties, but the impact of these have been estimated in the final result. Specifically, the following systematics have been included: • The energy scale uncertainty is taken to be 2.0%, and is fully correlated between the upper and lower configurations. This systematic is included via the GLoBES function glbShiftEnergyScale. A nuisance parameter is introduced corresponding to this energy scale shift.
• The uncertainty on the ratio of L −2 for the upper and lower positions is taken to be 2.0%. This introduces correlations between the spectral ratios.
The following chi-squared function is employed in these analyses: where S exp is the measured bottom/top ratio, S pred is the prediction of the same from GLoBES and V D is the covariance matrix describing these data. The nuisance parameter associated with the energy scale is ξ D and the corresponding uncertainty is σ D . In our analysis, we minimize over ξ D for each point in the sin 2 2θ ee -∆m 2 41 plane. The results of our analysis are shown in Fig. 27. The dark green, green and dark green contours represent the 95%, 99% and 99.9% C.L. contours, respectively. Relevant statistics are summarized in Table IX. DANSS presents the most compelling single piece of evidence for a sterile neutrino of the experiments that we have considered -the evidence rises to the level of 3.0σ.

Implementation in GLoBES
Daya Bay consists of an array of eight antineutrino detectors (ADs) and six nuclear reactors. Our analysis of the antineutrino spectrum at Daya Day is based on the 1958-day data release in Ref. [64]. The data have been taken in three phases, corresponding to the active number of ADs: sequentially six (6AD; 217 days), eight (8AD; 1524 days) and seven (7AD; 217 days).
In their 1230-day data release [56], the collaboration reports the average power of each reactor core during the 6AD period and in the first 1013 days of the 8AD period; similar figures have not been published for the 1958-day data. We assume that (1) the average powers during the entire 8AD period are given by the average powers during the first 1013 days, and (2) that the average powers of during the 7AD period are equal to those from the 6AD period. We do not expect these assumptions to dramatically affect the outcome of this analysis, but we look forward to updating this in the future. The average powers during the 6AD and 8AD periods from the 1230-day data are tabulated in Table XIV.
The distances between pairs of reactors and detectors are tabulated in Table I of Ref. [56]; we tabulate these in Table XIII. We treat the reactor cores as being pointlike, but the ADs are treated as 3-meter wide targets, the stated size of the acrylic vessel that contains the Gddoped liquid scintillator [56]. The function F d (q) (see Eq. (A.8)) for each detector d is written as follows: where r indexes the reactor cores, s indexes the three data-taking periods, t s d is the live-time of detector d dur-ing period s, P s r is the power of core r during period s and L dr is the distance between detector d and core r. The detector live-times are given as follows: • ADs 1-6 have t 6AD There are five factors that differentiate each detector: 1. For each detector, we define @time = 1.0 but, since not all detectors have been operational for the same amount of time, we fold these differences into the definition of @power. Namely, this is defined to be 2. The target masses have subpercent-level differences; these are tabulated in the first row of Table  XV and are accounted for using $target mass.
3. The total efficiencies of the ADs also vary; these are given in the second row of Table XV. We set @norm to be the stated value of ε tot for each detector.
4. The effective baseline of each detector, L d , is defined via These are tabulated in the third row of Table XV. 5. Lastly, the effective fuel composition visible to each detector varies; these are tabulated in the last four rows of Table XV.
The energy resolution of each detector is taken to be identical. In the supplementary material to Ref. [56], the Daya Bay collaboration publishes the response matrix that they use in their analysis. In principle, we could use the same response matrix here; to cut down on computation time, we parametrize the response using function The spectrum at each AD is calculated over [1.5, 12.8] MeV in true antineutrino energy in 226 bins. This results in a 0.05-MeV spacing over the visible spectrum; this fine a spacing allows us to recycle our spectrum calculations for our analysis of NEOS, explained in Sec. V.5. These bins are combined at the analysis level to reproduce the Daya Bay binning.
We are ultimately interested in the ratios of total events in EH2 (i.e., AD3 & AD8) and EH3 (AD4-7) to EH1 (AD1, 2), which we call EH2/EH1 and EH3/EH1. The spectrum observed at each EH is determined by simply summing the number of spectrum for each detector in the hall, accounting for the appropriate effective fuel fractions. Taking the ratios of these spectra, which we call S 12 pred and S 13 pred for EH2/EH1 and EH3/EH1, respectively, is then trivial.
We benefit from the extensive data release from the Daya Bay collaboration in the form of supplementary data to Ref. [64]. In particular, the total numbers of observed IBD candidates (signal+background) for each bin in each experimental hall reside in the files DayaBay IBDPromptSpectrum EH<N> 1958days.txt, where <N> = 1, 2, 3; the background spectra have also been published and reside in the files DayaBay BackgroundSpectrum EH<N> 1958days.txt. We extract the observed ratios S 12 exp and S 13 exp ; we show these data in Figs. 39a and 39b, respectively. These are the basis of our chi-squared, defined as where S 12 and S 13 have been combined into one vector S and V DB is the covariance matrix, which accounts for both statistical and systematic errors including correlations. We describe our estimation of the covariance matrix V DB in what follows. The number of signal events in any of EH1, EH2 or EH3 (from data), which we call s A , is given by the difference between the total number of events t A and the expected number of background events b A ; here, A (= 1, 2, 3) indexes the experimental hall. The uncertainty on t A is statistical; the event rates are large enough where they can assumed to Gaussian-distributed. The uncertainty on b A , however, is considered a systematic uncertainty; we discuss it further below. Therefore, the statistical uncertainty on a given element of s A , which we call s A i , is σ s A i = t A i . The statistical component of the covariance matrix is determined by randomly varying the s A i , calculating the ratios S 12 i = s 2 i /s 1 i and S 13 i = s 3 i /s 1 i separately for each bin i and determining at the covariance of the resulting pseudodata. The correlation between S 12 i and S 13 i is also recorded.
The collaboration provides a list of systematic uncertainties in Table VIII of Ref. [56]; in Ref. [64], it was reported that the uncertainty on the 9 Li background has been reduced from 43% to 27% in EH1 and EH2. We have considered the contribution of each of these components, accounting for the stated level of correlations between detectors, reactors and other systematics, using Monte Carlo methods: the fractional change in the event rate is calculated by varying each systematic and the covariance matrix of the resulting pseudodata is recorded. In practice, we find that the largest contributions to the covariance matrix come from (1) variation in the backgrounds, and (2) varying the detector response. Unsurprisingly, the uncertainty on the HM flux predictions is found to be negligible -this is precisely the reason why the ratios of event rates have been used in this analysis.
As previously mentioned, the absence of specific information regarding the true average reactor powers and live-times for each detector during each operation period means that our predicted ratios of spectra will not agree with the true ratios, even in the absence of a sterile neutrino. To compensate for this, we introduce two calibration factors, one for each spectral ratio, which are ad hoc attempts to reproduce Daya Bay data; these are introduced via the replacements S 12 → ζ 12 S 12 and S 13 → ζ 13 S 13 . These factors are determined by minimizing the above chi-squared function assuming only threeneutrino oscillations with the best-fit three-neutrino parameters in Ref. [47], and are found to be ζ 12 = 0.9933 and ζ 13 = 0.9973. A more accurate estimate of the sensitivity can be obtained using more accurate information regarding the cores and detectors.
The results of this analysis are shown in Fig. 28. The dark green, green and dark green contours represent the 95%, 99% and 99.9% C.L. contours, respectively. Relevant statistics are summarized in Table IX. Clearly, the evidence for a fourth neutrino from Daya Bay spectra is rather weak.

Implementation in GLoBES
Double Chooz consists of two monolithic, Gd-doped liquid scintillator detectors located in the vicinity of two commercial power reactors. The separations between the reactors and the detectors are taken from Fig. 1 of Ref. [53]; the distances are tabulated in Table XXIII. Given the O(100 − 1000) m baselines, we safely assume that both reactors are point-like. However, the detector is taken to have a width of 3 m; this helps to ensure that fast oscillations average out in our calculations, and is consistent with the physical width of the detector.
The collaboration has not published the (relative) average powers of the two reactors during the data collection periods for either the near or far detector; we assume that the average powers are the same, and comment on this further below. We again calculate the quantity F (q) as in Eq. (A.13), accounting for both reactors at each detector. This quantity is precomputed in steps of 0.01 in log 10 q; we calculate this for q ∈ [10 −3 , 10 1 ] for the near detector and q ∈ [10 −2.5 , 10 1 ] for the far detector. Each detector is separately defined within GLoBES, using the information contained in Table XXIII. We set @time to the operating time of the detector (in days), @power to 8.5 GW (assumed to be the same for the neat and far detectors) and @norm to be the efficiency. We set lengthtab to the corresponding value of 1/ L −2 , given by 0.4003 km and 1.052 km for the near and far detectors, respectively. The collaboration reports [53] that the near detector has 1.0042 ± 0.0010 times as many protons in its Gd-doped target as the far detector does, and 1.0045 ± 0.0067 times as many protons in its gamma catcher. Given the relative volumes of these two regions, we estimate that the near detector has 1.0044 times as many protons as the far detector. Consequently, we set $targetmass to be 1.0 for the far detector and 1.0044 for the near detector.
The energy spectrum in each detector is calculated in 26 bins between [1. and we again take the energy resolution to be 8%/ E[MeV]. Figure 4 of Ref. [53] shows the measured ratio of spectra at the Double Chooz near and far detectors; we reproduce these data in Fig. 38c in Appendix C. For our analyses, we have digitized these data and the corresponding statistical uncertainties. We also consider the following sources of systematic uncertainty:

Statistical Analysis
• Detection Uncertainties. These are enumerated in the third column of Table 2 in Ref. [53] and amount to a 0.475% uncertainty, assumed to be correlated between all energy bins.
• Flux Uncertainties. A 0.47% reactor-uncorrelated thermal power uncertainty has also been included. This results in a 0.4% uncertainty on the spectral ratio, assumed to be correlated between all energy bins.
• Backgrounds. Background rates at the near and far detector are given in Table 3 of Ref. [53] and the corresponding spectra are shown in Fig. 3 for both the near and far detectors. From these figures, we estimate the contribution of each background process to the background-subtracted event rate and the corresponding uncertainty using a simple Monte Carlo routine. This procedure yields the background-related uncertainty on the spectrum, including nontrivial correlations.
These are all included into a chi-squared function of the following form: where S exp and S pred are, respectively, the measured and predicted spectral ratios at Double Chooz. We note that we have reweighted the measured data by a factor i.e., by the ratio of the predicted numbers of events in either detector. The covariance matrix V DC contains the statistical and systematic uncertainties described above.
In the absence of published information regarding the relative powers of the two reactors over the operation of each detector, comparing S exp and S pred can be problematic -it's unclear what the appropriate threeneutrino-oscillation baseline should be. To address this, we reweight our predicted spectrum by a factor ζ DC , determined as follows. We calculate S pred assuming only three-neutrino oscillations using the value of sin 2 θ 13 measured by Double Chooz in Ref. [53], i.e, sin 2 2θ 13 = 0.105. We then replace S pred → ζ DC S pred and minimize over ζ DC ; we find ζ DC = 1.0026. We then include this factor in all subsequent calculations. The results of our calculation are shown in Fig. 29. 11 The dark green, green and dark green contours represent the 95%, 99% and 99.9% C.L. contours, respectively. Relevant statistics are compiled in Table IX. The evidence for a sterile neutrino at Double Chooz is weak -while the data show a clear deficit stemming from nonzero θ 13 , the data are sufficiently imprecise that any fine structure beyond the expected patter is washed out.

Implementation in GLoBES
Details of the geometry of NEOS are taken from Ref. [65]. The reactor core at unit 5 of the Hanbit Nuclear Power Complex, where NEOS is housed, is 3.1 m in diameter and 3.8 m in height. The detector is located in the plane 10.0 m beneath the center of reactor; labeling the center of core as the origin of our coordinate system and call the axis of the cylinder the z axis, the z coordinate of the detector is z = −10.0 m. The center-to-center distance is 23.7 m; the remaining coordinates of the center of the detector are taken to be x = +21.487 m and y = 0 m. The detector is 1.21-meter-long cylinder with diameter 1.03 m and is oriented so that its axis is in the xy plane, i.e., the axes of the detector and the core are orthogonal. Ref. [65] does not further specify the orientation of the detector axis in the xy plane, but the analysis is not particularly sensitive to this detail, given the size of the detector. For concreteness, we orient the detector along the x axis.
We estimate the distribution of baselines (more concretely, the distribution of 1/L 2 values between points in the core and the detector) using a simple Monte Carlo routine. The effective length to be L eff ≡ 1/ L −2 = 23.69 m. These distances are also used to calculate the function F (q) (Eq. A.8) over the range q = 10 1 , 10 4 in steps of 0.01 in log 10 q.
NEOS, itself, requires one experiment file within GLoBESfit. The NEOS collaboration reports a prompt energy spectrum on [1.0, 10.0] MeV. We choose to ignore the last bin, covering [7.0, 10.0] MeV, in our analysis; there are very few events in this bin, so we do not expect to lose much information. The NEOS collaboration presents their data in terms of a ratio relative to the antineutrino spectrum measured at the Daya Bay near detectors in Ref. [55]; see Fig. 3(c) of Ref. [65]. 12 The spectrum that we use to normalize the NEOS result is the sum of the antineutrino spectra in AD1 and, AD2, i.e., the detectors in EH1. The ratio of NEOS data relative to the Daya Bay spectrum is shown in Fig. 38d in Appendix C.
Since we are again interested in a ratio of spectra, @time, @power, @norm and $target mass are all fixed to be 1.0. We use the same parametrization of the energy resolution at NEOS as Ref. [70]; however, since GLoBES assumes all energies are in GeV, we write the resolution as Lastly, we fix $lengthtab = 0.02369 km, as described above. The spectrum is calculated for each of the four main fissile isotopes, and these are added together, weighted by the relevant fuel fractions [65,70]: (f 235 , f 238 , f 239 , f 241 ) = (0.655, 0.072, 0.235, 0.038) .

Statistical Analysis
The object in which we are ultimately interested is the double ratio of NEOS and Daya Bay spectra, to wit, The collaboration also publishes the ratio of their data relative to the HM fluxes; see Fig. 3(b) of Ref. [65]. Given the size of the theoretical uncertainties on the flux predictions and the unexplained spectral features at 1 MeV and 5 MeV, we consider the ratio of measured spectra, even though these have been measured by two different experiments.
where S A nν, i is the number of antineutrino events calculated using GLoBES for experiment A (= NEOS; DB, EH1) under the assumption of the existence of n neutrinos in bin i, defined using NEOS's 0.1-MeV binning. The justification for this is as follows.
As mentioned, the NEOS collaboration normalizes their spectrum relative to the Daya Bay measured flux. The latter has been inferred from the Daya Bay 1230day data after unfolding three-neutrino oscillations. To properly account for the presence of a fourth neutrino in this denominator, one must first refold the best-fit threeneutrino oscillations before unfolding four-neutrino oscillations, with some assumptions about the new mass and mixing. This amounts to multiplying by the ratio of the expected numbers of events in a given bin at the Daya Bay near detectors with and without a sterile neutrino. Additionally, because the Daya Bay prediction has been rescaled to the NEOS data, there should also appear a factor of the ratio of oscillation probabilities at NEOS with and without oscillations. We are thus led to consider Eq. (V.12), which has been previously considered in Refs. [12,71].
We use EH1 from Daya Bay to provide the rescaling of the measured flux in Eq. (V.12) in our calculations. The oscillation probabilities at NEOS are calculated assuming that only oscillations into the sterile state are relevant, but the full four-neutrino machinery is applied to Daya Bay. The following chi-squared is used in order to study the extent to which NEOS can probe the existence of a sterile neutrino: where S exp are the NEOS data, S pred is the ratio calculated using GLoBES and V N is the covariance matrix of these data. As with DANSS, we introduce an explicit nuisance parameter ξ N for the energy scale uncertainty, whose uncertainty is nominally 0.5% [65]. This is studied using the GLoBES function glbShiftEnergyScale; in our fits, this parameter is minimized for every point in the sterile neutrino parameter space.
We turn now to the covariance matrix V N . There are two contributions that we include in this matrix. The first is the statistical uncertainties on the ratio S exp , which are simply read off of Fig. 3(c) in Ref. [65]. The second is the covariance matrix published with the antineutrino spectrum in Ref. [55]. Daya Bay publishes their results using a different binning than NEOS; we reformulate the covariance matrix using a simple Monte Carlo routine. Random antineutrino spectra are generated using the Daya Bay spectrum and covariance matrix, including correlations. The antineutrino spectrum that NEOS would see is determined by interpolating these generated spectra to the stated NEOS binning. The contents of these bins using the above-stated energy resolu- tion, and the resulting covariance matrix is calculated. Formally, we need to consider the predicted spectrum given the stated NEOS fuel fraction, not the Daya Bay fuel fraction from which this result is derived. This introduces model dependence on the HM fluxes -and with it, dependence on the uncertainties on these fluxes. These contributions have been included with a similar Monte Carlo routine, but their contribution to the final result is small [70].
It bears mentioning that the use of the Daya Bay antineutrino spectrum to normalize the NEOS spectrum implies that these experiments are formally correlated with one another. However, we ignore the correlations between these experiments for the following reason. The Daya Bay antineutrino spectrum used at NEOS is based on the 1230-day data; in contrast, the Daya Bay analysis described in Sec. V.3 is based on the 1958-day data. Consequently, it's difficult to accurately assess the degree of correlation between these data sets. We hope to improve this aspect of our analysis with the collection and dissemination of more data.
The results of this analysis are shown in Fig. 30 The dark green, green and dark green contours represent the 95%, 99% and 99.9% C.L. contours, respectively. Relevant statistics are compiled in Table IX. The sensitivity at low ∆m 2 41 ( 0.3 eV 2 ) is driven by the use of the Daya Bay spectrum for normalization; NEOS itself drives the sensitivity in the ∼ 1 eV 2 region. One could, in principle, develop an analysis based on the ratio of NEOS data to the HM flux prediction, the data for which are shown in Fig. 3(b) of Ref. [55]; this is precisely what is done in, for instance, Ref. [61]. The exclusions derived there only have support in the region around ∼ 1 eV 2 , but, as mentioned above, are more dependent on the uncertainties of the HM fluxes.

Implementation in GLoBES
The geometry of the RENO experiment [72] has been previously discussed in the context of our rate analysis. We continue to assume that each of the six reactors is point-like, but that both the near and far detector have a width of 3.0 m, to facilitate fast oscillations in averaging out. The distances between either detector and each of the six reactors are shown in Table XVIII; the average powers of each reactor during the operating period of either detector are shown in Table XIX. These data have been obtained from private communications with the collaboration [58]. As with previous experiments, we precalculate F (q) (Eq. (A.13)) on a grid with spacing 0.01 in log 10 q over the range q ∈ [10 −2.5 , 10 1 ] for both detectors.
The near and far detectors are defined as separate experiments within GLoBES. For each, we set @time to the operating time of each detector (in days), @power to the total thermal power from all six reactors and @norm to be the efficiency of each detector. The relevant quantities are tabulated in Table XIX. In the absence of any information to the contrary, we assume the detectors to be of equal mass; consequently, we set $target mass to be 1.0 for each. We set $lengthtab equal to the appropriate value of 1/ L −2 for the near and far detectors; these values are, respectively, 433.1 m and 1446.9 m.
The antineutrino spectrum is calculated in 29 bins of equal width on [2.1, 7.8] MeV in antineutrino energy, corresponding to [1.3, 7.0] MeV of prompt energy. The RENO data, however, are presented with nonuniform binning; the third-and second-to-last bins have width 0.2 MeV and the last has width 0.3 MeV, whereas all the others have width 0.1 MeV. In our calculations, we manually combine events in our 0.1-MeV wide bins as appropriate to achieve the correct final binning. The effective fission fractions of the four main fissile isotopes for each detector have been provided by the collaboration [58]; for the near detector, we use

Statistical Analysis
Our analysis centers around the spectral ratio presented in the bottom panel of Fig. 2 of Ref. [72]. These data and the corresponding statistical uncertainties have been digitized for use in our analyses; we reproduce these data in Fig. 38e in Appendix C. Aside from these statistical uncertainties, we consider the following sources of systematic uncertainty: • Energy Scale. The collaboration claims a 0.15% uncertainty on the experimental energy scale [72]. We include this in our analyses via a nuisance parameter.
• Flux Uncertainty. A 0.9% uncertainty on the thermal power of each reactor has been included. Assuming that this is totally uncorrelated between cores, we estimate that this implies a ∼ 0.3% uncertainty on the far-to-near ratio.
• Backgrounds. Table 1 of Ref. [72] gives the estimated total background rates at the near and far detectors, and the insets to Fig. 1 of the same reference show the background spectra. We have digitized these event spectra in order to estimate the contributions of the backgrounds to the spectral ratio via Monte Carlo. The resulting uncertainties and correlations are recorded for use in our analyses.
Once more, we employ a chi-squared function of the form where S exp and S pred are the experimental and predicted IBD spectra, respectively. We reweight the measured data by a factor of to ensure consistency in our GLoBES analysis. The covariance matrix V R includes the statistical and systematic uncertainties described above, neglecting the energy scale uncertainty, including correlations. The nuisance parameter ξ R describes adjustments to the energy scale at RENO, accounting for its uncertainty σ R = 0.15%.
The effect of the energy scale shift is included through the GLoBES function glbShiftEnergyScale. In our calculations, we minimize χ 2 RENO with respect to this nuisance parameter for every point in the sin 2 2θ ee -∆m 2 41 plane. In an attempt to faithfully reproduce this experiment in GLoBES, we reweight our predicted spectrum by an ad hoc factor ζ R , determined as follows. We have simulated the IBD spectra at RENO assuming the best-fit value of sin 2 2θ 13 determined in Ref. [72], i.e., sin 2 2θ 13 = 0.0896. We then replace S pred → ζ R S pred , and minimize over ζ R . We find the minimum at ζ R = 1.0088. We then use this value of ζ R in all subsequent calculations.
The results of our calculation are shown in Fig. 31. 13 Relevant statistics are compiled in Table IX. The evidence for a sterile neutrino from RENO is relatively weak.

VI. SPECTRAL ANALYSES
In this section, we present and discuss our analyses of the experiments presented in the previous section. In particular, we consider several groupings of these experiments and assess the significance of the evidence for the existence of a sterile neutrino.

VI.1. Aggregating Results
In Table IX, we combine relevant statistical quantities from our sterile-neutrino analyses of the experiments described in Sec. V. We also tabulate the locations of the best-fit points in the sin 2 2θ ee -∆m 2 41 plane for each experiment; these suppressed in Figs. 26-31 for clarity.

VI.2. Combined Sterile Neutrino Analysis
The exclusion contours from a combined analysis of these spectral measurements is shown in Fig. 32. Since we have assumed correlations between these experiments are small, this figure is equivalent to the naive additional of the χ 2 maps in Figs. [27][28][29][30]. Relevant statistics are compiled in the penultimate line of Table IX. At the bestfit point, we find χ 2 min = 175.2; given the three-neutrino value χ 2 3ν = 188.2, this corresponds to p = 1.6 × 10 −3 , or roughly 3.2σ.
While introducing a sterile neutrino has significantly improved the fit, the three-neutrino hypothesis is not obviously insufficient -for 212 degrees of freedom, the above value of χ 2 3ν corresponds to p = 0.88. All of Bugey-3, Daya Bay, Double Chooz and RENO have values of χ 2 3ν that are less than the number of data points used in each analysis; simply put, these experiments do not convey a need to introduce a sterile neutrino. On the other hand, DANSS and NEOS show a strong preference for a sterile neutrino, as previously discussed (particularly the former); these experiments drive the global preference for a sterile neutrino, as has been previously observed [12,61,71]. The best-fit point in Fig. 32 is at (sin 2 2θ ee , ∆m 2 41 ) = (3.80 × 10 −2 , 1.26 eV 2 ), which is roughly consistent with the best-fit point from the spectral analyses in Refs. [12,61]. The best-fit points for each analysis are represented by shapes of the corresponding color.

VI.3. Additional Analyses
In this subsection, we discuss particularly interesting subsets of the spectral-ratio data and the extent to which these indicate the possible existence of a sterile neutrino.
Calibration: Measuring sin 2 2θ13 and ∆m 2 31 We begin this section by discussing how we verify that our GLoBES analyses of Daya Bay, Double Chooz and RENO reproduce the experimental measurements of the three-neutrino parameters sin 2 2θ 13 and ∆m 2 31 in the absence of a sterile neutrino. We scan over these parameters and calculate the χ 2 precisely as described above; the results are shown in Fig. 33. The dark (light) regions represent the 95% (99%) C.L. contours derived for each of Daya Bay (red), Double Chooz (purple) and RENO (blue). The gray curves represent a combined analysis of all three of these.
These regions are broadly consistent with the best-fit values found in the analyses of the respective collaborations [53,64,72]. 14 This figure can also be compared 14 We elect to frame this analysis in terms of ∆m 2 31 instead of ∆m 2 ee . The precise relationship between these two quantities has  against Fig. 5 of Ref. [47], where a combined analysis of these medium-baseline experiments has also been performed. We again find general agreement between the calculated regions. The correspondence is not exact -for instance, the region preferred by Double Chooz is slightly wider here than the corresponding region in Ref. [47] but given that these fits have been performed with often incompletely reported information, we consider this a modest success. One may worry that introducing a sterile neutrino may cause a significant mismeasurement of sin 2 2θ 13 at reactor experiments. We have investigated how sensitivity to a sterile neutrino changes if sin 2 2θ 13 is unfixed from its best-fit value, and how introducing a sterile neutrino may shift the preferred value of sin 2 2θ 13 . 15 We find, however, that this is not the case. The measurement of sin 2 2θ 13 is driven primarily by Daya Bay (see Fig. 33), whereas the measurement of sin 2 2θ ee is driven primarily by DANSS, and that these measurements are largely uncoupled when performed simultaneously. This validates our fixing sin 2 2θ 13 in our analyses above.

DANSS and NEOS
Given that DANSS and NEOS drive the sensitivity to a sterile neutrino in our global fit, it seems pertinent to ask how the combination of just these two experiments constrains the sin 2 2θ ee -∆m 2 41 plane. The resulting 95%, 99% and 99.9% C.L. contours are shown in dark green, green and light green, respectively, in Fig. 34. Relevant statistics are compiled in Table IX in the row headed "DANSS + NEOS." been debated in the literature [73,74], but this is not relevant for our purposes here. 15 Given that accelerator neutrino experiments also provide a measurement of ∆m 2 31 , and that these measurements are consistent with the value measured at reactors, we leave this parameter fixed at its best-fit value for this study. Restricting to these two experiments has resulted in p = 9.2 × 10 −4 , corresponding to 3.3σ. Notice, however, that the value of the chi-squared at the best-fit point, χ 2 min , is 84.9. For 82 degrees of freedom, this corresponds to p = 0.39; this is an acceptable fit, but hardly compelling evidence for the existence of a sterile neutrino. Moreover, the best-fit points from our analyses of DANSS and NEOS do not coincide, and the best-fit point from their combination lies close to that from DANSS alone. Consequently, we have investigated the compatibility of these data sets using a parameter goodness-of-fit test [75]. We define where χ 2 A is the minimum value of the chi-squared for experiment A. We find χ 2 PG = 6.4; assuming this is chi-squared distributed implies p = 4.0 × 10 −2 . These data sets seem to be in mild tension. While parameter goodness-of-fit values should be viewed with some skepticism (see, for instance, Ref. [16]), it bears mentioning that the evidence for a sterile neutrino from this experiments, while modestly strong, is not entirely overwhelming.

Daya Bay and NEOS
Given the manner in which we have been forced to include NEOS -namely, as a ratio with respect to Daya Bay's EH1 -one could reasonably object that showing results from NEOS on its own has stripped this analysis of valuable context. To this end, we have performed a combined analysis of Daya Bay and NEOS, the results of which we show in Fig. 35. The resulting 95%, 99% and 99.9% C.L. contours are shown in dark green, green and light green, respectively, and relevant statistics are compiled in Table IX in the row headed "Daya Bay + NEOS." While the best-fit point to the NEOS data unto itself lies in the region around ∆m 2 41 ∼ 10 −1 eV 2 , this part of the parameter space is disfavored by the medium-baseline experiments, including Daya Bay. Combining Daya Bay and NEOS makes this apparent; while this part of the parameter space is still allowed, much more of the sterile neutrino parameter space is included in the at 95% C.L.

region.
As mentioned in Sec. V, we have ignored possible correlations between Daya Bay and NEOS in our analysis because they are hard to quantify exactly. Consequently, this analysis is tantamount to adding together the individual χ 2 maps over the sterile neutrino parameter space. We note, however, that accounting for these correlations is one way in which this analysis can be strengthened in the futurel e look forward to improving this treatment in future versions of GLoBESfit.

Analyzing Modern Experiments
In our analyses of rate experiments, we considered how considering only experiments from the 2010s alters the evidence for a sterile neutrino. In the interest of parity, we perform a similar analysis here. For the experiments that we consider, this is tantamount to removing Bugey-3 from the fit.
We show the results of this analysis in Fig. 36. As before, the 95%, 99% and 99.9% C.L. contours are shown in dark green, green and light green, respectively. Relevant statistics are compiled in the last line of Table IX. It is clear that removing Bugey-3 from the fit has not dramatically changed the best-fit region (cf. the dark green region in Fig. 32). In fact, the fit is slightly stronger for Bugey-3's omission -the preference for a sterile neutrino rises to 3.3σ which, while not much higher than 3.1σ, is an increase relative to the analysis of all experiments. The reason for this is clear from Fig. 26: Bugey-3 indicates no preference for a sterile neutrino at ∆m 2 41 = 1.26 eV 2 . In fact, our analysis implies that the best-fit point from our analysis of modern experiments is less favored than zero mixing by the Bugey-3 data.
We emphasize that we know of no concrete reason why any experiment performed before 2010 (in this case, Bugey-3) should be disregarded in any of the fits we have performed. We mean merely to demonstrate that if one were suspicious of these data, for whatever reason, then their inclusion has not dramatically enhanced the preference for a sterile neutrino -the evidence is, in fact, largely driven by experiments from the past decade.

Energy Scale Uncertainties at Bugey-3, DANSS and NEOS
An important source of systematic uncertainty in these experiments is the energy scale of the detector. While the energy scale of liquid scintillator detectors is nonlinear, we assume in our analyses that the energy response is linear, and that the only uncertainty is on the slope of this energy response. This is precisely the meaning of the nuisance parameters ξ 15,40,D,N in Sec. V.
Loosely speaking, the uncertainty on the energy scale translates into uncertainty on the inferred value of ∆m 2 41 ; a systematic shift in reconstructed values of E ν induces a corresponding offset in the measured value of ∆m 2 41 . If the energy scales were changed at these experiments, then the fringe-like structures in the resulting confidence level contours would similarly be shifted up and down in Figs. 26-31.
As we have discussed, the evidence for a sterile neutrino is driven by the combination of DANSS and NEOS. This can be seen from Figs. 27 and 30 by looking at the fringes in the confidence level contours: the best-fit point resides in the region where the patterns of fringes line up and leave a gap in the sterile neutrino parameter space. One could then ask if the patterns of fringes can be moved relative to one another to produce an allowed region elsewhere. Conversely, we have also seen that Bugey-3 presents a mild challenge to the sterileneutrino interpretation of modern spectral experiments. One could also ask if allowing the energy scale to vary at Bugey-3 allows for this tension to be mitigated.
Unsurprisingly, we find that shifting the energy scales within their stated error budgets does negligibly little to address either of these concerns -clearly, this would have already appeared in our analyses, wherein the energy scales are allowed to vary as nuisance parameters. One could, however, ask how increasing these uncertainties modifies our analyses. We have repeated these analyses with the energy scale uncertainties all increased to 20%. 16 Even here, we find that the resulting patterns of 16 We do not claim that any of these experiments has underreported their energy scale uncertainty. We mean merely to isolate the effect of this particular contribution to the overall error budget.
fringes have not appreciably moved in the ∆m 2 41 direction relative to the nominal analyses. Therefore, we are led to conclude that energy scale uncertainties do not contribute meaningfully to the preference for a sterile neutrino in these experiments.

VII. STERILE NEUTRINOS AND THE 5 MEV BUMP
The presence of an unexplained spectral feature at 5.0 MeV in the absolute prompt energy spectrum, independently observed by several experiments [22][23][24], has caused significant interest in recent years. In a sense, the so-called 5 MeV bump is a microcosm of the current situation regarding reactor antineutrino anomalies: there is a lingering suspicion that some unaccounted-for nuclear physics effect is operative, but precisely how this filters into the hunt for sterile neutrinos is unclear. In Ref. [76], it was proposed that the bump could arise from the process 13 C(ν, ν n) 12 C in the presence of some new interaction. While this interaction could, in principle, explain the excess, it was found that concrete models of potential new physics that could provide such an interaction are already strongly constrained. This all suggests that nuclear physics effects are a much more likely explanation of the bump.
Still, if the bump is indeed the result of a misunderstanding of the antineutrino flux, then this does not imply that a sterile neutrino does not exist. However, it is important to understand the effect that the bump has on searches for sterile neutrinos; this is the subject of this section. In the first subsection, we study a subset of the global dataset indicating the existence of the bump to determine how strong the evidence for this bump is. In the second, we present sterile neutrino analyses of the global reactor dataset with a phenomenological parametrization of the bump, and discuss the results.

VII.1. The Evidence for the Bump
We begin by considering the most relevant subset of the evidence for the existence of the bump. In particular, we consider the following three measurements: 1. The measurements of the isotopic prompt energy spectra for 235 U and 239 Pu inferred from Daya Bay burn-up measurements in Ref. [29].
2. The fractional excess of events in the region [3.8, 7.0] MeV (prompt energy) relative to the HM flux predictions, published by RENO in Fig. 5    We make extensive use of the supplementary material published by the Daya Bay and PROSPECT collaborations, as pertains to these measurements. For the RENO data, we have digitized Fig. 5 of Ref. [59] for our analysis. More specifically, Daya Bay and PROSPECT have published covariance matrices for their measured spectra, including nontrivial correlations; for RENO, we assume that the data are uncorrelated in the absence of concrete information.
We ask how well these data are described by the HM fluxes and determine the extent to which introducing a bump affects the quality of this fit. To these ends, we use the Markov Chain Monte Carlo (MCMC) package emcee [77] to perform two types of analyses. In the first, we allow only the magnitudes of the isotopic fluxes of 235 U and 239 Pu to vary -multiplied by quantities r 235 and r 239 , respectively -while maintaining the shapes given by the HM fluxes. In the second, we add a separate Gaussian feature to each of these isotopic fluxes. The Gaussians themselves are normalized to unit area, though we multiply them by n 235 and n 239 , respectively; we simplify our analysis by fixing the means of these Gaussians to 5.8 MeV antineutrino energy (corresponding to 5.0 MeV prompt energy), and we assume they share a common width σ bump . We scan over the appropriate parameter spaces and determine the posterior probability densities for each case, and compare the maximum likelihoods.
We have considered two combinations of the data mentioned above. For the first, we simultaneously analyze Daya Bay and RENO; for the second, we add PROSPECT into the analysis. Table X shows relevant statistics from our analysis performed in the absence of a bump. We present the maximum value of the likelihood L in terms of the minimum value of −2 log L. Shown also are the number of degrees of freedom n dof , and the p-value derived assuming (−2 log L) min is Gaussianwe reiterate that we do not include this experiment as a part of our sterile neutrino exclusion. distributed. Moreover, Table XI shows the same quantities for our analyses in which bumps in the isotopic fluxes are introduced. Clearly, the current data greatly prefer this sort of bump over the pure HM fluxes.
More of the structure of the fits in which a bump is invoked is shown in Figs. 40 and 41 in Appendix C for our analyses without and with PROSPECT, respectively. In the two-dimensional planes, the blue, orange and red contours represent the 68.3%, 95% and 99% credible regions (C.R.), respectively. Additionally, the one-dimensional plots also show the 90% C.R. curve in green. Firstly, both analyses prefer the HM fluxes to be rescaled downward in magnitude, particularly for r 235 . Secondly, while the data are largely consistent with the absence of a bump in 239 Pu, nonzero n 235 is preferred at 7σ -the data unequivocally prefer a spectral distortion.
It is interesting that the fit to Daya Bay and RENO in the presence of a bump is fairly high quality (p = 0.70) but degrades quite steeply (p = 1.2 × 10 −2 ) when PROSPECT is introduced. There are two primary reasons for this. Firstly, the Daya Bay and RENO analyses prefer for the absolute IBD yields from 235 U and 239 Pu to be scaled down by ∼ 10%. 18 On the other hand, the PROSPECT measurement of the magnitude of the 235 U flux is consistent with the HM prediction. One can see this from Figs. 40 and 41: adding in PROSPECT shifts the central value of r 235 considerably. Secondly, the PROSPECT data, unto itself, does not fare particularly favorably in the face of the HM+bump hypothesis. In this case, we find (−2 log L) min ≈ 55; for n dof = 29, this leads to p = 2.5 × 10 −2 . It bears mentioning that PROSPECT is still collecting data -this discrepancy may well be resolved with increased statistics and improved analysis techniques.

VII.2. Sterile Neutrino Analyses with the Bump
We turn now to the issue of how the inclusion of a bump in the isotopic flux of 235 U alters the evidence for the existence of a sterile neutrino. We will consider both rate and spectral analyses in turn.
We begin by considering the effect of the 5 MeV bump on our rate analyses in Secs. III and IV. Antineutrinos in the range [5.3, 6.3] MeV are predicted to constitute 13.5% of IBD events at these experiments. If the 235 U flux is augmented with a Gaussian bump of the sort indicated by our Daya Bay/RENO/PROSPECT analysis, i.e, a Gaussian bump with height n 235 = 5.763×10 −3 (cf. Fig. 41), then this factor becomes 14.4% -a 0.9% increase. Moreover, since many of the experiments considered here have fission fractions of the order f 235 ≈ 0.5, we see that adding this sort of Gaussian bump increases the predicted IBD rate by 0.5%. This will not be enough to explain the entire deficit with respect to the HM prediction, as we will see.
On the other hand, we have the measurements of ratios of spectra in Secs. V and VI. As argued previously, we expect these measurements to be largely insensitive to the details of the antineutrinos flux. Therefore, we do not expect the 5 MeV bump to significantly affect searches for sterile neutrinos in this channel, either.
We have confirmed this suspicion by calculating the distribution of χ 2 over the sin 2 2θ ee -∆m 2 41 plane -precisely as has been done previously -with the additional contribution from a Gaussian bump in 235 U. We vary the height of the bump between 0.0 and 10 −2 in the units of Figs. 40 and 41, and have determined that the minimum value of the χ 2 changes by less than one unit over this range. Specifically, we find χ 2 min (n 235 = 0) = 174.8 19 and χ 2 min (n 235 = 10 −2 ) = 175.5. The bulk of this difference is attributable to NEOS, given their unconventional approach to normalizing their spectrum. Moreover, we find that the statistical significance, in terms of nσ, does not vary meaningfully from 3.1σ.
We are led to conclude that the presence of the 5 MeV bump does not substantively impact searches for sterile neutrinos at reactor experiments, because (1) the contribution of events contained within the bump is at the sub-percent level, and (2) spectral ratios are, by design, largely insensitive to the particulars of the flux model employed.

VIII. REGARDING NEUTRINO-4
Neutrino-4 [78,79] is a liquid-scintillator antineutrino detector located at the SM-3 reactor in Dimitrovgrad, Russia. The moveable detector covers a range of distances -between 6 and 12 m -from a 100 MW core. Recently, the collaboration has reported [78] results that imply 3.5σ evidence for the existence of a sterile neutrino; preliminary results presented in December 2019 [79] suggest that this feature persists with the collection of more data.
The data currently available to the public are insufficient to allow us to attempt to replicate the collaboration's analysis. However, we can get a sense for how compatible Neutrino-4 is with the other experiments that we have considered by overlaying their respective allowed contours. We show precisely this in Fig. 37. The dark green, green and light green curves are, respectively FIG. 37: A comparison between our spectral results, recent results from Neutrino-4 [79] and a global analysis of tritium-endpoint experiments [80]. We show our 95%, 99% and 99.9% C.L. contours in dark green, green and light green; the 1σ, 2σ and 3σ C.L. contours from Neutrino-4 in dark blue, blue and light blue; and the 90% and 99% C.L. contours for tritium experiments in solid and dashed black. the 95%, 99% and 99.9% C.L. contours from our combined analysis of measured spectral ratios from Bugey-3, DANSS, Daya Bay, Double Chooz, NEOS and RENO. The dark blue, blue and light blue curves are the 1σ, 2σ and 3σ allowed contours shown on slide 30 of Ref. [79]; we caution that these results are preliminary, but proceed regardless. We also show the 90% (solid black) and 99% (dashed black) C.L. contours from the combined analysis of the Mainz, Troitsk and KATRIN tritium-endpoint experiments presented in Ref. [80], where a similar comparison has been performed on the basis of Neutrino-4 results from Ref. [78].
As observed in Ref. [80], the best-fit point from Neutrino-4 (sin 2 2θ ee = 0.31, ∆m 2 41 = 7.32 eV 2 ) is disfavored, to a moderate degree, by the tritium-endpoint experiments. Similarly, the best-fit point to Neutrino-4 data is inconsistent with the region preferred by reactor spectral ratios. Moreover, while not shown in Fig. 37, the Neutrino-4 best-fit point is also disfavored by the integrated rate measurements presented in Sec. VI -in particular, see Fig. 22. The degree of (in)compatibility has been tabulated in Table XII. For each of our three rate analyses and for our spectral analysis, we show the difference in χ 2 between our best-fit point and that of Neutrino-4 (∆χ 2 N4 ), as well as the corresponding p-value and the equivalent number of σ.
Clearly, none of the four analyses are particularly tolerant of this result from Neutrino-4. Our spectral anal-  ysis disfavors the best-fit point at the 3.3σ level, but all three rate analyses disallow this point at 4σ. It is perhaps imprudent for us to focus solely on the best-fit point from Neutrino-4: significant portions of the 3σ-preferred parameter space cannot be firmly refuted by the tritiumendpoint experiments or by our analyses. However, in the absence of enough information to perform our own analysis of Neutrino-4, it is difficult to precisely discuss the global compatibility of these results. Interestingly, the updated analysis of Ref. [79] contains a 3σ-preferred region around ∆m 2 41 ∼ 1 eV 2 that was not present in Ref. [78]. If confirmed in an official publication, then this would be broadly consistent with the best-fit region from our analysis of spectral ratios. It is our hope that an extensive data release will allow us to study Neutrino-4 in more depth in the future.

IX. CONCLUSIONS
In this work, we have studied the evidence for the existence of sterile neutrinos contained in the global reactor antineutrino dataset using a new open-source, GLoBESbased toolkit that we call GLoBESfit. We have demonstrated that inferences from IBD rate measurements are confounded by the necessary dependence on a particular flux model. In particular, while the evidence obtained using the traditional HM fluxes is at the level of 2.3σ, updated ab initio fluxes imply < 1σ significance, while updated spectral-conversion fluxes yield 2.6σ significance. This discrepancy can only be resolved with continued improvements to predictions of reactor antineutrino fluxes, which in turn rely on the collection of improved data. Ratios of measured IBD spectral, on the other hand, seem to paint a more optimistic picture: taken together, the current dataset suggests that the evidence rises to the level of 3.2σ. This is certainly tantalizing, but is hardly ironclad. Therefore, we are led to conclude that that issue of the existence of additional species of neutrinos cannot be resolved one way or another with current data. We have also addressed the 5 MeV bump, and have found that the presence of this feature does not dramatically alter inferences about sterile neutrinos.
In addition to presenting our results, this manuscript is also intended to document GLoBESfit. This software is available for download from either GitHub [81] or from our dedicated website [82]. The function of making GLoBESfit publicly available is twofold: 1. Publishing our code allows for members of the community to levy informed commentary on our techniques. Global fitting in an inherently tricky business; reproducing an experiment's results can be challenging for those not involved in the collaboration. Invariably, there are aspects of this program that can be improved. Our hope is that public input, over time, will reinforce this work.
2. Though we intend to revisit and expand on this work in the future, it is not possible for us to test every interesting physical hypothesis that could explain current neutrino anomalies. We have focused on the 3+1 scenario because of its simplicity, but Nature could certainly be far more interesting than this. If a user is interested in testing a particular model, then the tools we have developed ought to allow them to at least start that process. Because GLoBESfit is developed using pre-existing GLoBES architecture, it should lend itself to much a wider set of applications than the relatively simple scenario we have considered here.
Our hope is that making these tools available contributes meaningfully to the wider neutrino physics community.

Regarding The Use of Statistics
A common sin in global analyses is assuming that ∆χ 2 is chi-squared distributed. This issue has been discussed in detail in the literature; see, for instance, Refs. [83][84][85]. Analyses of spectral ratios are particularly susceptible to this statistical idiosyncrasy, as is demonstrated nicely in Fig. 5 of Ref. [84]; we direct the interested reader to this work for more details. The primary issue is that while the expected ratio of spectra is flat in the absence of a sterile neutrino, statistical noise invariably results in a preference for oscillations with some nonzero amplitude. Failing to take this into account yields final results that are overconfident. In this work, we have been conservative in talking about the statistical significance of our findings, particularly those in Sec. VI. While the significance we have presented here ostensibly rises to the ∼ 3σ level, the true significance is certainly less than this. The actual preference for a sterile neutrino in the global reactor dataset, accounting for this effect, remains to be determined.
The difficulty is that the precise relationship between ∆χ 2 and a p-value depends strongly on the experiment(s) under consideration. How ∆χ 2 is distributed -and thus the p-value associated with a particular measurementcan be determined numerically as a function of sin 2 2θ ee and ∆m 2 41 using Bayesian methods; this is the basis of the Feldman-Cousins technique [83]. However, this requires the ability to reliably simulate the experiment(s) under consideration; the capacity to do so is limited for those not involved in a given collaboration. Even with perfect simulation capabilities, however, this sort of analysis can be prohibitively computationally demanding, particularly if considering multiple experiments simultaneously. We hope to address these concerns in future releases of GLoBESfit, but for the time being, we are left wanting for a more sophisticated statistical treatment. 20 While we believe that the results we have presented here are merited in their own right, our hope is that presenting the results as we have -and acknowledging the ways in which the underlying treatment is deficient -is sufficient to promote dialogue between experimentalists and theorists on how to best employ the available data. It is incumbent on theorists to ensure that experimental data is used correctly in their analyses; it is incumbent on experimentalists to follow good data preservation practices. Decisions for how to best commit the neutrino community's finite resources ultimately rely on understanding what is (and is not) contained in current data.
Acknowledgments In this appendix, we outline how oscillations involving sterile neutrinos are calculated using existing GLoBES machinery. We begin by reviewing the basics of oscillations with sterile neutrinos before turning to our implementation thereof. We stress that this machinery can be modified to implement any new-physics scenario that 20 As this work neared completion, Ref. [86] appeared on the preprint arXiv. In this work, the author performs precisely the sort of analysis advocated above. They find that such a Feldman-Cousins correction diminishes the overall significance from 2.4σ to 1.8σ. We note that the author has included data from PROSPECT [87], as well as preliminary, updated data from DANSS [88], but not from Daya Bay, Double Chooz nor RENO; this explains the difference in significance relative to our findings here.
the user wishes to probe; our discussion of how to create custom oscillation engines and probability matrices is completely generalizable.

Oscillations with a Sterile Neutrino
The generic expression for the vacuum oscillation probability P αβ -the probability for a neutrino produced in the flavor eigenstate ν α to be observed in the flavor eigenstate ν β (α, β = e, µ, τ, s) -is given as follows: where U αi (α = e, µ, τ, s; i = 1, 2, 3, 4) are the elements of the leptonic mixing matrix, L is the distance propagated (i.e., the baseline), E ν is the neutrino energy and we define ∆m 2 ij ≡ m 2 i −m 2 j , with m i the neutrino masses. For oscillations involving antineutrinos, we make the replacement U → U * ; this is equivalent to changing the sign of the last term in Eq. (A.1).
There are only four relevant elements of the leptonic mixing matrix, for our purposes here. We parametrize them as follows: one can easily verify the unitarity constraint 4 i=1 |U ei | 2 = 1. In our analyses, we have assumed that the values of θ 12 and θ 13 determined from global fits to oscillation data assuming three-neutrino oscillations [47] can be applied without alteration. Of course, what experiments actually measure is (some combinations of) the elements of the leptonic mixing matrix. Consequently, a determination of, say, |U e2 | 2 is absolute, whereas the inferred value of θ 12 will depend on values of θ 13 and θ 14 inferred elsewhere. However, as long as θ 14 is small -which we find to be the case in our fitsthe implied modifications to the three-neutrino mixing angles are also small.
When (anti)neutrinos propagate through matter, the background potential from ambient protons, neutrons and electrons modifies their propagation. The effects of this matter potential can be included via modifications to the mixing angles and mass-squared differences. Moreover, GLoBES does contain the functionality to calculate neutrino propagation in a matter background. Given the low energies under consideration here, however, we as-sume that the vacuum-oscillation formalism is sufficient.
For short-baseline experiments -experiments with baselines O(100) m -with point-like sources and a point-like detectors, we use the two-flavor approximation for the electron-type survival probability, P ee : where sin 2 2θ ee = 4|U e4 | 2 (1 − |U e4 | 2 ) = sin 2 2θ 14 is the effective active-sterile mixing angle. We work in terms of sin 2 2θ ee even though this is equal to sin 2 2θ 14 in our parametrization; the effective angle sin 2 2θ ee is parameterization independent and thus more appropriate for comparison with the broader literature. For medium-baseline experiments, this two-flavor formalism is insufficient -we consider the full, four-flavor formalism. The electron-type survival probability is instead written as this can be immediately derived from Eq. (A.1) by setting α = β = e.
For the experiments we consider, we often cannot consider oscillations with one well-defined baseline: either (1) the experiment is sourced by multiple reactors, or (2) the physical extent of the core or detector (or both) is not much smaller than the distance between them. In either case, one must generalize the term sin 2 ∆m 2 L 4Eν to a flux-weighted average over baselines. The appropriate quantity is where we define q ≡ ∆m 2 4Eν and introduce f (L) as the distribution of baselines in the experiment. The denominator is the mean-inverse-squared baseline, which we denote L −2 . We discuss our implementation of these integrals below.

Implementation in GLoBESfit
The survival probability P ee is calculated in GLoBES with custom oscillation engines using the function glbDefineOscEngine, which is called in the main program file. A sample call of this has the following form: glbDefineOscEngine(NUMP, &<exp> probability matrix, &glf get oscillation parameters, &glf set oscillation parameters, ''<engine name>'', user data); We explain the arguments of this function below.
NUMP: The maximum number of oscillation parameters used. For four-neutrino oscillations, this number should be 12; practically, we take it to be 6. <exp> probability matrix: This is the function in which the oscillation probabilities are calculated for experiment <exp>. This function produces a three-by-three matrix P whose elements are the active-active oscillation probabilities (i.e., P [0][0] corresponds to P ee , P[0] [1] to P eµ , etc.). Currently, these functions only calculate P ee ; all other probabilities are fixed to 0. We discuss the evaluation of the probabilities in the next subsection.
glf get oscillation parameters: This function retrieves the current values of the oscillation parameters.
glf set oscillation parameters: This function sets the values of the oscillation parameters.
''<engine name>'': This is the internal name of the oscillation engine, which is called in the corresponding AEDL file via $oscillation engine=''<engine name>''.
user data: A void pointer to user-specified data that are relevant for the evaluation of the probability. We discuss the use of this apparatus below.

Custom Probability Matrices
For most of the experiments we consider, it is sufficient to take the core to be point-like but to give the detector finite extent. We usually make the approximation that the detector constitutes a (portion of a) spherical shell centered around the core. The function f (L) is then flat over the extent of the detector, which we consider to be L a < L < L b ; this leads to a relatively simple expression for L −2 , as well as the function F (q) in Eq. (A.8): where Si(x) = x 0 sin t t dt is the sine integral. For each experiment of this sort, the width of the detector is specified in GLoBESfit as the sole element of an array typically named <exp> wide. This is passed into the probability engine by setting "user data" to be "(void *) <exp> wide" in glbDefineOscEngine, above. The function in Eq. (A.10) is evaluated, using this input, with a helper function called lsin(q, La, Lb). Here, La and Lb are the nearest and further baseline at the experiment, determined from the width and the (average) baseline thereof. Note that all lengths in GLoBES are assumed to be in km. Moreover, since GLoBES also assumes all energies are in GeV, q in this function is numerically given by q = 1.267 Occasionally, the experimental geometry results in a more complicated f (L) whose structure we must take into account. This distribution is estimated using a simple Monte Carlo routine. Pairs of points are randomly drawn from the core and detector, whose geometry we must specify by hand. One can then numerically integrate over this distribution to determine L −2 and F (q); this numerical integral is performed as a sum over these pairs.
To save computation time, F (q) is precomputed on a grid and is fed into GLoBESfit as a look-up table, whose elements are of the form { log 10 q, F(q)}, that resides in auxiliary header files. Linear interpolation in log 10 q is used to get intermediate values. If q is less than some minimum specified value q min , then it is assumed that the integral scales quadratically and F (q) is determined by extrapolation. If q is instead greater than some maximum specified value q max , then it is assumed that F (q) has averaged out to 0.5. The values of q min and q max depend on the experiment under consideration. We adopt this method for Daya Bay and RENO for both the rate and spectral analyses; moreover, we do this for Bugey-3, 21 DANSS, Double Chooz and NEOS for our spectral analysis.
The precomputed F (q) is fed into the custom oscillation engine via the "user data" option. This is handled by a custom data structure called "glf distance data", whose elements are the length of the precomputed array and a pointer to the array itself. The structure for experiment <exp> is named "<exp> s"; this is handled by by setting "user data" to be "(void *) &<exp> s" in the definition of the oscillation engine.
For medium-baseline experiments, each detector and reactor can be considered point-like, but these experiments are sourced by multiple reactors. The reactors generically have different average powers over the duration of the experiment; these weight their contributions. The integrals over baselines in L −2 and F (q) are replaced by sums: where r indexes the reactors, of which there are N r , and 21 We do not use this technique for Bugey-3 or Bugey-4 for the rate-only analysis because we expect the antineutrino spectrum to be more sensitive to the specific experimental geometry than the total event rate.
the average power of each is P r . For rate-only analyses, the multiple reactors at Chooz, Double Chooz and Palo Verde are handled using different AEDL experiment definitions; the resulting event rates are added at the analysis level. Meanwhile, for rate-only analyses of Daya Bay and RENO, as well as the spectral analyses at Daya Bay, Double Chooz and RENO, the contributions of multiple reactors are combined at the level of the AEDL file. This is purely a function of the development history of GLoBESfit and does not reflect differing physics in these cases.
there again exist functions for both two-flavor (glf two state probability matrix) and four-flavor (glf four state probability matrix) oscillations.
glf probability.h: The header file associated with glf probability.c. This is called by both glf rate.c and glf spectrum.c. glf type.h: The header file wherein the structure "glf distance data" is defined. This is called by several files.
Makefile: The file used to make the executables ./glf rate and ./glf spectrum (see below). The user may need to modify this file to link to their local GLoBES libraries -specifically, the "prefix" option may need to be set to the location of the user's GLoBES directory.

Rate Files
We next consider files specific to the rate module. rate combo{2,3}.glb: The files containing the AEDL definitions of the experiments we will consider, described in Sec. III. These files employ the HM flux model. The order of the experiments is important in the evaluation of the chi-squared; we enumerate the experiments contained in each file below.
• Short-baseline experiments are contained in rate combo.glb: These numberings correspond to the internal GLoBES ordering. Note that some AEDL files apply to more than one experiment. The ordering of these experiments is a function of the development history of GLoBESfit and is in no way a commentary of any of these experiments.

Daya
We define a GLoBES rule for each of the four main, fissile isotopes for each experiment. This allows the user to separate out contribution of each isotope in whatever calculation they are interested in. Moreover, it cuts down on the number of flux files that one must keep on hand; instead of defining a separate flux file for each experiment, one need only combine the fluxes from each isotope with the appropriate fuel fraction. Moreover, the theoretical uncertainties on the isotopic IBD yields for 235 U, 238 U, 239 Pu and 241 Pu are declared in the AEDL definition for Bugey-4/Bugey-3 (15 m) in the field @sys on errors.
We have also made clones of this file that modify how these systematics are treated; see below.
Because the absolute numbers of events are not relevant for our analyses, @time, @power, @norm and $target mass are all generally set to 1.0 in the AEDL definitions; we comment below on instances in which we deviate from this. The fluxes from each of the four main fissile isotopes for each experiment are considered separately and reweighted by the effective fuel fractions given by each experiment. Consequently, @time, @power and @norm are set to be the same for each component of the flux. The value of $length tab is set to 1/ L −2 for each experiment; Exceptions to this occur for medium-baseline experiments -namely, Palo Verde, Double Chooz, Chooz, Daya Bay and RENO -which are typically sourced by multiple reactors, or are comprised of multiple detectors, whose relative contributions must be taken account.
• For Palo Verde and Chooz, @norm is set equal to the numerical value of the exposure, in GW·yr, for each detector, which we have calculated in Sec. III.
• We do something similar for Daya Bay. We set @power equal to the total power delivered to each detector, namely s r t s d P s r , (B.1) where r indexes the eight reactor cores and s indexes the periods of operation; the Daya Bay rateevolution analysis is based on the 1230-day data set from Ref. [57], so we only consider the 6AD and 8 AD periods. We set @norm to be equal to each glf spectrum chi.c: The file containing the definition of the chi-squared function(s) detailed in Sec. V.
glf spectrum chi.h: The header file corresponding to glf spectrum chi.c. This is called by glf spectrum.c. glf spectrum.c: The main file that gets run by the executable ./glf spectrum. The structure of this file is similar to main rate.c, above.

Executables and Options
./glf rate: The main executable for the rate-only aspect of GLoBESfit.
./glf spectrum: The main executable for the spectrum aspect of GLoBESfit.
We summarize particularly useful command-line options common to either executable.
• -bN: Allows the user to turn off a particular block of experiments corresponding to the index N. (We call these "blocks" because "experiment" can be ambiguous in the parlance of GLoBES. Removing Daya Bay from the calculation, for instance, requires that we ignore several experiments in the AEDL file.) We comment on these blocks below.
• -rN: Sets the number of points used in each direction in the parameter scan to be N. By default, N = 51. Sets the output file of the parameter scan to be [file name]. If left blank, then the output is set to stdout, i.e., it writes to screen.
Additionally, there are some options that are specific to the rate-only analysis.
• -S: Turns on systematics stemming from the HM flux predictions. This changes the chi-squared function from glf rate chi nosys to glf rate chi.
• -M: Replaces the HM fluxes with the SM fluxes, using the systematic uncertainties of the former. This changes the chi-squared function from glf rate chi nosys to glf rate chi SM.
• -H: Replaces the HM fluxes with the HKSS fluxes, including their own uncertainties. This changes the chi-squared function from glf rate chi nosys to glf rate chi HKSS.
• -u: Turns on systematics stemming from 238 U and 241 Pu and scans over the normalizations of the 235 U and 239 Pu fluxes instead of considering a sterile neutrino. This changes the chi-squared function from glf rate chi nosys to glf rate chi unfix.
Note that these options are mutually exclusive and have been listed in order of priority -for instance, entering "-M -S -H" at the command line will be treated as if only "-S" has been entered. There is one further ratespecific option: • -w: In addition to the values of sin 2 2θ ee , ∆m 2 41 and χ 2 , this option toggles writing the best-fit values of the four nuisance parameters ξ 235 , ξ 238 , ξ 239 and ξ 241 to disk, in that order. This only works in conjunction with -S, -M and -H.
Finally, we list some options that are specific to the spectral-ratio analysis.
• -T: In addition to scanning over sin 2 2θ ee and ∆m 2 41 , this option triggers an additional scan over sin 2 2θ 13 on the range [0.0, 0.2]. (This option has been used to perform the calibration scan whose results are shown in Fig. 33.)

Experimental Blocks
Currently, glf rate is not set up to isolate every individual experiment. However, given that the covariance matrix decomposes into blocks, this allows us to trivially remove blocks of experiments from the analysis. Most of these blocks consist of only one experiment, while the rest include multiple experiments. Here, we tabulate which experiments correspond to which of these blocks.
The number associated to each block is the appropriate choice of N to be specified at the command line. As an example, if one wanted to remove Nucifer and RENO from the analysis, one would include "-b9 -b14" at the command line.
glf spectrum uses similar options for toggling on/off experimental blocks. However, since we treat the six above-mentioned experiments as uncorrelated, each experiment resides in its own block, i.e., there is a one-toone mapping of experiments to blocks.    Bay. The first row shows the total fiducial target mass of the detector in metric tons [56]. The second row shows the total efficiency εtot for each detector [55]. The third row shows the effective baseline of each detector in meters, calculated using the distances in Table XIII and the powers in Table XIV. The last four rows are the average effective fuel fraction at each detector at Daya Bay, assumed to be the same for the 6AD and 8AD period [55].
(d) The spectrum measured at NEOS [65] relative to the Daya Bay spectrum [55].