Sterile neutrinos and the global reactor antineutrino dataset

We present results from global fits to the available reactor antineutrino dataset, as of Fall 2019, to determine the global preference for a fourth, sterile neutrino. We have separately considered experiments that measure the integrated inverse-beta decay (IBD) rate from those that measure the energy spectrum of IBD events at one or more locations. The evidence that we infer from rate measurements varies between ≲ 3σ and negligible depending on the reactor antineutrino flux model employed. Moreover, we find that spectral ratios ostensibly imply ≳ 3σ evidence, consistent with previous work, though these measurements are known to be plagued by issues related to statistical interpretation; these results should therefore be viewed cautiously. The software used is the newly developed GLoBESfit tool set which is based on the publicly available GLoBES framework and will be released as open-source software.


Introduction
Nuclear reactors have been workhorses for neutrino physics since its inception as an experimental science by Cowan and Reines [1]. A number of important measurements of fundamental neutrino 1 properties have been made using reactor neutrinos: we highlight the confirmation of solar neutrino flavor conversion by KamLAND [2] and the measurement of θ 13 [3][4][5]. Neutrinos are not produced in the fission process itself but in the subsequent beta decays of neutron-rich fission fragments. The first reactor neutrino experiments already had to deal with the question of how to predict the neutrino flux from these complex sources [6]; interestingly, the uncertainty quoted was 5-10%. The neutrino flux question has taken on a central role with reactor antineutrino anomaly (RAA) [7]: an initial reactor flux re-evaluation [8], later confirmed by one of the present authors [9], lead to an increase in predicted reactor neutrino event rates and thus to an observed deficit of reactor neutrinos.
An exciting, albeit speculative, explanation would be provided by oscillations into a sterile neutrino with ∆m 2 1 eV 2 . Unsurprisingly, this hypothesis has triggered significant interest [10], resulting in a number of new reactor neutrino experiments aimed at finding an actual oscillation signature. At the same time, this interest also resulted in a significant effort, both theoretically and experimentally, to improve reactor neutrino flux predictions; see, for instance, ref. [11] and references therein.
The interpretation of reactor neutrino data in terms of sterile neutrino oscillations is relatively straightforward and free of major tension, even when combined with other available electron neutrino disappearance data; see, e.g., refs. [12,13]. However, when an attempt is made to explain also the anomalous results in theν µ →ν e and ν µ → ν e channels as observed by LSND [14] and MiniBooNE [15] significant tension arises: for a sterile neutrino to be able to mediate the transition between two different active flavors, the sterile neutrino must mix with both of these flavors. As a result, there is a well-defined relationship between the amplitude of ν µ → ν e oscillations and the expected effect in the disappearance channels ν µ → ν µ and ν e → ν e . The size of the RAA is consistent with sterile neutrino interpretation of LSND and MiniBooNE, but no evidence is found for JHEP01(2021)167 commensurate disappearance in the ν µ → ν µ channel. This issue has persisted for some time and there is debate about the statistical significance of this tension, for a recent review see ref. [16]. Nonetheless, the lack of ν µ disappearance presents a considerable deficiency in the sterile neutrino hypothesis. Here, we remain agnostic with respect to LSND and MiniBooNE and focus entirely on the electron neutrino disappearance sector.
Given the significant ongoing experimental effort in reactor neutrino experiments, combined with the potential for a major discovery of new physics, we will discuss in detail the existing reactor antineutrino data. We will compare the data with the Huber-Mueller flux model, as well as with two state-of-the-art flux predictions, one based on the summation approach [17] and one based on the conversion approach [18]. We will critically examine how these differing flux models affect the evidence for the sterile neutrino hypothesis. Moreover, we present an open-source software framework called GLoBESfit based on the established GLoBES [19,20] software suite, which will allow users to fit their preferred model to the data. We also hope that this open framework can be adopted by the experimental community to share their results.
A major result of this work is that we find if we restrict the analysis to data taken after 2010 -i.e., the modern θ 13 reactor experiments and their short-baseline counterparts -then the reactor rate anomaly seems to be enhanced. This restriction should not be misconstrued as a lack of trust in the older experiments, but it does reflect the fact that data curation and release practices were less refined in the past. As a result, much less detailed information is available for properly including older experiments into a global fit.
Best practices in data sharing are crucial for this field to make progress, since we have previously shown [21] that no single, existing reactor experiment can definitively test the currently preferred parameter space. Therefore, global fits will play an important role in neutrino physics for the foreseeable future, reinforcing the value of an open fitting framework.
This manuscript is arranged as follows. In section 2, we discuss the predictions of the fluxes of antineutrinos at nuclear reactors that we use, including how systematic uncertainties on these predictions are incorporated in our analyses. In section 3, we discuss the total IBD rate experiments that we have included in our fits, and in section 4, we present our combined analyses of various subsets of these data. Similarly, in section 5, we discuss the IBD spectrum experiments that we have included in our analyses, and in section 6 we present combined analyses of these experiments. We address the effect of the infamous 5 MeV bump [22][23][24] on sterile neutrino searches in section 7, and offer concluding remarks in section 9.
In addition to communicating our findings, one of the intentions of this manuscript is to provide documentation for the tools that we have developed. Pursuant to this, we have included several appendices that deal with technical aspects of GLoBESfit. In appendix A, we provide an account of how oscillations involving a sterile neutrino have been included in GLoBES, and in appendix B, we provide a general overview of the files that constitute GLoBESfit, including their content and some aspects of their functionality. Lastly, we provide extensive data tables and supplemental figures in appendix C; these data are important for our analyses, but would have been intrusive to have included in the main text.

JHEP01(2021)167 2 Reactor antineutrino flux models
Predictions of the spectra of antineutrinos from reactors have played a central role from the inception of neutrino physics as experimental science [6]. More recently, the RAA [7] has put a spotlight on flux models; for a recent review on flux models and how they are produced, see ref. [11]. For the discussion here, we note the following the salient features. Neutrinos are produced not by the fission process itself, but by the beta decays of about 800 fission fragment isotopes, corresponding to about 10,000 beta decay branches. A large fraction of these beta decays are of (unique and non-unique) forbidden type, which implies a significant dependence on details of nuclear structure for both the emitted electron and antineutrino spectra. We will use the following three flux models: 1. Huber-Mueller (HM): this flux model [8,9] has become the de facto standard in the field and is based on the conversion of integrated beta spectrum measurements performed in the 1980s at the Institut Laue-Langevin [25][26][27]. This flux model has relatively little dependence on nuclear databases and employs the allowed approximation, i.e., all beta decays are treated as allowed decays.
2. Ab initio: this flux model [17] uses fission fragment yields and information on individual beta decays from nuclear databases to directly compute the neutrino flux. It also employs the allowed approximation.
3. Hayen-Kostensalo-Severijns-Suhonen (HKSS): this flux model [18] shares many of the features of HM, that is, it is based on a conversion of the integrated beta spectra. 2 Importantly, it is the first attempt to include effects from forbidden decays in a systematic fashion.
The ratios for the ab initio and HKSS flux predictions relative to the HM predictions are shown in figures 1a-1d for, respectively, 235 U, 238 U, 239 Pu and 241 Pu. In each panel, the orange line at 1.0 represents the HM predictions; the blue curve represents the ab initio predictions; and the dark cyan curves represent the HKSS predictions. The associated bands represent the (1σ) systematic uncertainties on these predictions; we discuss how these are calculated below. For comparison, we also show the measured neutrino spectrum from Daya Bay [28] (red) and PROSPECT [29] (pink). Daya Bay can separate the contributions from 235 U and 239 Pu by comparing data taken at different points in the fuel cycle. Meanwhile, PROSPECT is deployed at HFIR, which is fueled with highly-enriched uranium meaning nearly all fissions are of 235 U. Visual inspection indicates that all three flux models and the neutrino data agree for 239 Pu within the large error bars. However, the ab initio flux model predicts a lower flux for 235 U than either HM or HKSS, which seems to agree somewhat better with data. Comparison of the data with any of three flux models reveals a bump in the region around 5-6 MeV. In later sections, we will perform a more careful analysis; impressions from this intial inspection will essentially be confirmed.
In all these calculations, one should formally be accounting for nonequilibrium corrections to the antineutrino flux; these arise from the build-up of long-lived fission products in JHEP01(2021)167 the core, which may take tens of days to achieve secular equilibrium and whose contributions to the isotopic fluxes are time dependent. This contribution has been approximated by Mueller, et al., in ref. [8] for 235 U, 239 Pu and 241 Pu. However, the collaborations often do not publish information about their burn-up, so it is unclear how large the nonequilibrium correction should be. Even taking a nonequilibrium correction corresponding to 450 days of irradiation -the largest such correction presented in ref. [8] -the difference between this the absence of nonequilibrium correction is typically overwhelmed by both experimental and other theoretical uncertainties. Moreover, the uncertainty on the magnitude of this contribution is of the order of ∼30%. Therefore, we do not expect these to meaningfully impact the final results; we therefore elect to exclude them from the present analyses.
The experimental quantity of interest in all cases is the convolution of the neutrino flux with the inverse beta decay (IBD) cross section σ IBD and the electron-type survival probability P ee (see eq. (A.7) in appendix A). We define the flux weighted cross sections σ i X as follows: This quantity provides the IBD rate for antineutrinos with true energy in the range [E min , E max ] calculated using flux model X = HM, AI, HKSS for isotope i = 235, 238, 239, 241. The total IBD rate in this energy window is then the sum of the σ i X , weighted by the appropriate effective fuel fraction f i , i.e., the fraction of fissions due to each isotope during the operation of the experiment. In this work, we use the IBD cross section published in ref. [30] -particularly, the results up to and including O(1/M ) as given in their eqs. (14) and (15). The uncertainty on this cross section is negligible compared to the uncertainties on the fluxes.
We consider two types of reactor antineutrino measurements in this work. In sections 3 and 4, we consider experiments that measure the total IBD rate, but not necessarily the specific energy of a given event. For these, we will take [E min , E max ] = [1.8, 8.0] MeV. On the other hand, in sections 5 and 6, we consider measurements of the IBD spectrum over various energy bins, each with its own [E min , E max ]. Our treatment of the flux systematics at total IBD rate experiments is common to all such experiments; the flux systematics at spectrum experiments must be handled on a case-by-case basis.
We use the following generic procedure to incorporate systematic uncertainties stemming from flux predictions into our analyses.
1. We use information presented in the literature to assign a covariance matrix to a set of flux predictions.
2. We then generate random variations of these fluxes using this covariance matrix to numerically calculate σ i X in eq. (2.1), assuming P ee (E ν ) = 1, for some specified [E min , E max ].
3. For total rate analyses, we take [E min , E max ] = [1.8, 8.0] MeV and calculate the σ i X and their correlations. These results will be applied to all total IBD rate measurements simultaneously, as will be described in section 4.  [8,9] and are, of course, equal to one everywhere. The blue curves are for the ab initio fluxes presented in ref. [17]. The dark cyan curves are for the updated conversion method calculation presented in ref. [18], which we call HKSS. The shaded regions represent the corresponding uncertainties, being equal to the square root of the diagonal elements of the global covariance matrix. Note that the ab initio fluxes have no corresponding uncertainties; see text for details. The data points are based on antineutrino measurements performed at reactors, Daya Bay [28] (red) and PROSPECT [29] (pink).

JHEP01(2021)167
4. For spectral analyses, we calculate fractional deviations in the individual bins, including correlations. Usually, we will be interested in the ratio of two such spectra. The resulting covariance matrices are then combined with other experimental systematics; see sections 5 and 6 for more details.
When we numerically integrate over the predicted fluxes, we do so by logarithmically interpolating/extrapolating on the published flux models.

HM fluxes.
For the HM fluxes we use the central values published in tables VII-IX of ref. [9] for the fluxes from 235 U, 239 Pu and 241 Pu, respectively, whereas we use the values from the "N ν , 12 h" column of table III of ref. [8] for 238 U. Regarding errors, we use the following prescription. For 235 U, 239 Pu and 241 Pu, we take the (percent) errors listed in the columns headed "stat." and "bias err." from tables VII-IX of ref. [9], respectively, to be fully uncorrelated between bins and these three isotopes; and the columns headed "Z," "WM" and "norm." are added in quadrature and taken to be fully correlated between bins and these three isotopes. We assume all errors to be Gaussian; asymmetric errors are replaced by a Gaussian with width given by the geometric mean of the upper and lower uncertainties. For 238 U, we combine the errors in the "Nuclear databases" column of table II in ref. [8]-the correlations of which are shown in table I of the same reference -along with the "Missing info." column of the same -which we take to be fully correlated. These uncertainties yield the orange bands in figures 1a-1d.

JHEP01(2021)167
In our analyses, we will always apply the systematic errors associated to the HM fluxes to the ab initio fluxes, as well. This is likely an optimistic assignment; though precise uncertainties have not been calculated, the true systematic uncertainty on these predictions is likely of order 5%. However, we will argue below that this treatment is sufficient for the conclusions of this work. For now, we simply point out that this treatment results in a more aggressive assignment of statistical significances than a more realistic error budget would produce.
We calculate the following isotopic IBD yields: HKSS fluxes. The fluxes are presented in tables VI-IX of ref. [18]; specifically, we use the results in the columns headed "δN Num.," which represent the (percent) rescaling introduced by including first-forbidden decays to the reactor antineutrino flux. Systematics are handled by splicing together a subset of the errors published in ref. [18] with those for the HM fluxes from refs. [8,9]. Specifically, we use all the components of the HM uncertainty budget discussed above, and we add to this the uncertainties under the columns headed "g A " and "Param." in tables VI-IX of ref. [18]; these latter components are each taken to be totally correlated between bins and isotopes. These uncertainties yield the dark cyan bands in figures 1a-1d. The procedure outlined above gives us the following isotopic IBD yields: σ 235 HKSS = (6.67 ± 0.15) × 10 −43 cm 2 /fission σ 238 HKSS = (10.08 ± 1.14) × 10 −43 cm 2 /fission σ 239 HKSS = (4.37 ± 0.12) × 10 −43 cm 2 /fission σ 241 HKSS = (6.06 ± 0.14) × 10 −43 cm 2 /fission

JHEP01(2021)167
Note that these have increased by ∼ 1% relative to the nominal HM predictions. The uncertainties have also grown, stemming from uncertainty related to g A and the parametrization procedure used in ref. [18], but the change is quite modest -including first-forbidden contributions has not resulted in a grossly inflated error budget. These results are described by the following correlation matrix:

Rate experiments
In this section, we describe the experiments that have measured the IBD rate that enter into our analysis and how these are defined in GLoBES. We are ultimately interested in the ratio of the measured IBD rate relative to one of (1) the HM predictions, (2) the ab initio predictions, or (3) the HKSS predictions. These are compared against the experimentally determined ratios relative to the same flux prediction. Many of the details regarding specific implementations of these experiments are the same between them; we relegate discussion of these to appendix B.
In the following, we will assign energy resolutions to experiments for which spectral information is unavailable. This is a necessary input for GLoBES; these assignments are arbitrary and do not impact the energy-integrated rate analyses presented here. Moreover, the uncertainties of each of these measurements -and the correlations between themhave been adapted from refs. [13,32].

Short-baseline experiments
Bugey-4. The detector width is taken to be 1.3 m and the core is assumed to be pointlike. The center-to-center distance between the core and the detector is 15 m. The fuel fractions published by the collaboration [33] are The experimental resolution is not published by the collaboration. In the absence of further input, we take the resolution to be 6%/ E [MeV], which is the stated resolution for Bugey-3. The published total IBD yield is σ = 5.752 × 10 −43 cm 2 /fission [33]; this leads us to R HM = 0.941, R AI = 0.979 and R HKSS = 0.933. The experimental uncertainty is taken to be 1.4%, which is entirely correlated with Rovno 91. The resolution is not published by the collaboration. In the absence of further input, we take the resolution to be 6%/ E[MeV], as for Bugey-4. The absolute IBD cross section measured at Rovno 91 is σ = 5.85 × 10 −43 cm 2 /fission [34]. We find R HM = 0.939, R AI = 0.983 and R HKSS = 0.931. The total experimental uncertainty is 2.8%, of which 1.4% is correlated with Bugey-4.

JHEP01(2021)167
Bugey-3. The collaboration reports IBD rate measurements at 15 m, 40 m and 95 m. The fuel fractions published by the collaboration [35] are: which we assume applies to all three measurements. The collaboration claims an energy The collaboration only publishes the ratios of their measurements with respect to refs. [26,27]; we instead derive IBD cross sections from their stated event rates and experimental specifics and find σ = {5.75, 5.79, 5.33} × 10 −43 cm 2 /fission. We find R HM = JHEP01(2021)167

ILL.
We mostly follow ref. [37]; however, we also correct for the fact that ref. [38] reports that the original power of the ILL experiment was underreported by 9.5%. The detector width is 2.0 m, the core is assumed to be point-like and the center-to-center distance is 8.76 m. The core consists of 93% enriched 235 U; our analysis assumes that this is the only relevant isotope. The energy resolution is taken to be 18%/ E [MeV].  Ref. [41] also includes measurements of charged-and neutral-current deuterium disintegration rates, but these are significantly less precise than the IBD rate measurement; we do not include these in our fits. Savannah River. The detector is taken to have a width of 2.0 m and the center-to-center distances are 18.2 m and 23.8 m in the two detector configurations with a core of essentially pure 235 U. The energy resolution is taken to be 20%/ E [MeV]. The collaboration does not explicitly report their inferred IBD rate, but from the integrated rates in  [43]. The first two (1I and 2I) are polyethylene detectors studded with 3 He proportional counters, each located ∼18 m from the core. The latter three (1S, 2S and 3S) are liquid scintillation detectors; 1S and 3S are located 18 m from the core, while 2S is located 25 m from the core. 6 Each detector has a width of 1.0 m, which is roughly consistent with figure 1 of ref. [43], and the resolution is taken to be 20%/ E [MeV].

JHEP01(2021)167
Each measurements takes different effective fuel fractions: (  [43]. However, we expect these differences to be small, so we ignore them.  In table III of ref. [43], the collaboration has provided their inferred IBD rate for each of these five experiments. However, in table II of the same reference, they provide various experimental data that can, in principle, be used to rederive their results. Using the information in that Nucifer. Given the short distance between the core and detector (7.21 m), one might worry that the finite extent of both the reactor core and of the detector may be important in determining the correct oscillation probabilities. However, given that Nucifer reports a total rate measurement, and given the relatively poor energy resolution, we find that this effect is not quantitatively important here. The composition of the core is [45] (f 235 , f 238 , f 239 , f 241 ) = (0.926, 0.008, 0.061, 0.005) .

JHEP01(2021)167
The core is dominated by 235 U, but since the effective fuel fractions for all four isotopes have been provided, we take these into account. The collaboration claims an energy resolution of 20%/ E [MeV]. From their reported event rate and other experimental specifics, we estimate the IBD rate to be σ = 6.847 × 10 −43 cm 2 /fission. This leads to R HM = 1.046, R AI = 1.115 and R HKSS = 1.036. The experimental uncertainty is 10.7% and is uncorrelated with any other experiment.

Medium-baseline experiments -Total rates
We now turn our attention to medium-baseline oscillation experiments. At these distances, O(10 2 − 10 3 ) m, two-flavor oscillations are insufficient; we must at least account for nonzero θ 13 and ∆m 2 31 (or the effective mass splitting ∆m 2 ee ). In practice, we include full four-neutrino oscillations, though we keep the relevant parameters from three-neutrino oscillations fixed at their best-fit values from ref. [46]: Note that sin 2 θ 23 does not factor into this analysis.

JHEP01(2021)167
We define each detector width to be 4.0 m and take the reactors to be point-like; this helps high-frequency oscillations to average out smoothly, but otherwise does not impact our analysis. Since medium-baseline experiments are sourced by multiple reactors, our GLoBES definitions must account for the relative contributions of the reactors.
As mentioned, three-neutrino effects are important at these baselines; this can cause some ambiguity in the definition of the ratio R. In this work, we always assume that R is defined with respect to the absence of oscillations. We contrast this with the expected deficit within the three-neutrino framework relative to the absence of oscillations, which we will call R 3ν .
Palo Verde. Palo Verde [47] consists of three reactor cores; one is located 750 m from the detector and the other two are at 890 m. Table I of ref. [47] presents the operating cycle over the course of the experiment. Assuming the total power of 11.63 GW th is split evenly among the three reactors over the duration of the experiment and accounting for efficiency, the total exposure from the 750 m reactor is 372.6 GW·yr, whereas the total exposure from both 890 m reactors is 706.0 GW·yr. The fuel fractions are not stated in ref. [47], so we take them from refs. [13,32]: The resolution is also not stated in ref. [47], so we assume it to be 20%/ E [MeV].
We use information in ref. [47], along with the total proton number in ref. [48], 7 to determine σ = 6.036 × 10 −43 cm 2 /fission. From this, we calculate R HM = 0.971, R AI = 1.014 and R HKSS = 0.963. The uncertainty is taken to be 5.4%, which is uncorrelated with any other experiment. For context, ref. [50] reports a three-neutrino expectation of R 3ν = 0.967. Double Chooz. Double Chooz [51,52] consists of two reactors and two detectors; this measurement pertains to the near detector, from which the two reactors are 355 m and 469 m away. We assume that each reactor operates for the same amount of time over the course of the experiment, and that the average powers delivered during those times are the same. The effective fuel fractions are assumed to be the same for each reactor: We take the energy resolution to be 8%/ √ E [51]. Ref. [52] reports σ = 5.71 × 10 −43 cm 2 /fission. However, this result has been corrected for the effects of nonzero θ 13 . We calculate R HM = 0.934, R AI = 0.969 and R HKSS = 0.926. The corresponding uncertainty is 1.4%, which is uncorrelated with any other experiment.
Chooz. Chooz [53] consists of two reactors located 998 m and 1115 m away from the detector. The two reactors have different run times and effective powers; the operation periods are broken down in table 1 of ref. [53]. We assume that the reactors operate at JHEP01(2021)167  We take the energy resolution to be 0.5 MeV; ref. [53] claims a resolution of 0.33 MeV, but we use the more conservative estimate here. From ref. [53], we derive σ = 5.71 × 10 −43 cm 2 /fission, giving us R HM = 0.976, R AI = 1.013 and R HKSS = 0.968. The experimental uncertainty is 3.2%, which is uncorrelated with any other experiment. For context, ref. [50] reports a three-neutrino expectation of R 3ν = 0.954.

Medium-baseline experiments -Rate evolution
Daya Bay. The details about the experimental geometry are well documented in refs. [54,55]. We tabulate relevant experimental specifics in tables 13-17 in appendix C. The fuel evolution result that we use corresponds to the 1230-day data release; an analogous analysis for the 1958-day data appeared in ref. [28], which we look forward to including in the future.
AD8 did not operate for as long as AD1-3 in the 1230-day dataset; this is characterized by the exposure for detector d, defined as where P 6AD,8AD days, but AD8 has t 6AD d = 0 days. Moreover, we account for the different total AD target masses and efficiencies, given in table 15.
The Daya Bay data are presented in bins of differing fuel fractions. We present the average fuel fractions of these bins and the corresponding IBD rate measurements in table 16. These IBD rates have been inferred by the collaboration accounting for three-flavor oscillations. To construct our analyses, we correct these measurements by reinserting these effects, assuming the best-fit values presented in ref. [55]: Our calculations of the ratios R HM , R AI and R HKSS are also shown in table 16. Our analyses employ the covariance matrices provided by the collaboration in the Supplementary Material to ref. [56], appropriately converted from absolute IBD rate to ratio with respect to our IBD predictions. Ref. [57] presents a total IBD rate measurement whose systematic uncertainty has been decreased by 29% relative to prior measurements. Following the prescription of ref. [58], we have decreased the systematic uncertainties from ref. [56] in our analyses by this amount.

RENO.
In  and in table 20, we show other important experimental information, including the detector operating time and efficiency. These data will also be used in the RENO spectral analysis; here, we only need the information for the near detector.
We take the flux-weighted average baseline to be 433.1 m. Out definition of "average baseline" is different from that of RENO [60]; we weight by the reactor flux at the detector, but RENO considers the average distance that a neutrino that has already interacted in the detector would have traveled. This is why we don't use the 410.6 m that the collaboration publishes. We take the energy resolution to be a constant 0.4 MeV.
As with Daya Bay, the RENO collaboration reports their total IBD yield for eight different sets of fuel fractions; we present these fuel fractions and the corresponding measured IBD rates, obtained by digitizing figure 2 in ref. [60], in table 21. As with Daya Bay, these IBD rates have been corrected for that three-flavor oscillations that have been removed in these measurements, assuming the best-fit values presented in ref. [61]: In table 22, we show our estimates of the ratios using the HM, ab initio and HKSS fluxes. The systematic uncertainty is given by the (quadrature) sum of the thermal power (0.5%), fission fraction (0.7%) and detection efficiency (1.93%) uncertainties; this totals 2.1% and is assumed to be totally correlated between data points.

Rate analyses
In this section, we discuss how the experiments in the previous section constrain the mass and mixing of an additional sterile neutrino. This is one possible application of GLoBESfit-the analysis techniques we describe can be modified for any one of a number of new-physics scenarios.

Data -A combined view
We depict the data discussed above in figures 17 and 18. The orange data points represent the ratio R of measured IBD rates relative to the prediction calculated using the HM flux model. The blue data points are the same using the ab initio fluxes; the dark cyan points employ the HKSS fluxes. The depicted error bands are experimental only; theoretical uncertainty is of order ∼ 2.5%. As discussed in section 2, the reduction of the antineutrino flux from 235 U, in particular, in the ab initio fluxes results in a diminished predicted IBD rate, implying larger experiment-to-prediction ratios. On the other hand, the HKSS fluxes almost universally result in a higher predicted flux, resulting in a diminished ratio R.
For each group of correlated measurements, we show the 68.3% and 95% confidence level (C.L.) contours derived for each flux model with solid and dashed curves, respectively, JHEP01(2021)167 In tables 1, 2 and 3, we tabulate relevant statistics for our analyses using the HM, ab initio and HKSS fluxes, respectively. Specifically, we show the minimum value of the χ 2 for each group of experiments, χ 2 min , as well as the χ 2 in the limit of no mixing with a sterile neutrino, χ 2 3ν . The difference between these is used to determine the p-value for each group of experiments, assuming these are chi-squared distributed.

Combining experiments
To analyze all of these experiments simultaneously, we combine the uncertainties on the integrated rate experiments into one covariance matrix, the elements of which are given by correlated and uncorrelated uncertainties described in section 3. We use separate covariance matrices for Daya Bay and RENO, which have been provided by the respective collaborations. We incorporate systematic uncertainties stemming from the flux predictions via four nuisance parameters (one for each of 235 U, 238 U, 239 Pu and 241 Pu), which we bundle into a vector ξ. These flux predictions are correlated; we account for these correlations using the covariance matrices discussed in section 2.
The chi-squared function we employ has the following form: where R exp is the vector of experimental ratios and R pred = R pred (sin 2 2θ ee , ∆m 2 41 , ξ) is the vector of predicted ratios. Our treatment of theoretical uncertainties has been previously described in section 2; the theoretical covariance matrix V th accounts for these in our calcu-  Table 1. Values of χ 2 for our analysis -namely, the minimum value (χ 2 min ), the value for sin 2 2θ ee = 0 (χ 2 3ν ) -and the p-value at which three-neutrino mixing can be excluded for each experimental block for our analysis of the HM fluxes. The number of data points in each block is denoted n data . We convert the p-value to the number of σ in the last column.

JHEP01(2021)167
lations. Furthermore, V exp is the covariance matrix describing experimental uncertainties. We minimize over ξ for each point in sin 2 2θ ee -∆m 2 41 parameter space. In figures 19, 20 and 21, we separately show the regions preferred by short-baseline (SBL) and medium-baseline (MBL) experiments using the HM, ab initio and HKSS fluxes, respectively. In each figure, we show SBL experiments in blue and MBL experiments in orange. Solid contours represent 95% C.L., whereas dashed contours represent 99% C.L. Relevant statistics are compiled in tables 1-3.
In each case, the MBL experiments prefer a smaller value of ∆m 2 41 than the SBL experiments, as one would expect. In the region above ∆m 2 41 1 eV 2 , we see that the MBL experiments are generally consistent with the region preferred by the SBL experiments. Both SBL and MBL experiments separately indicate 2σ evidence for a sterile neutrino for the HM and HKSS flux models, though the latter is stronger than the former. Conversely, the ab initio fluxes are much more indifferent to the existence of a sterile neutrino; the 95% C.L. curves do not close for either SBL or MBL experiments, as can be seen from figure 20.
In figure 22, we show results from combining all SBL and MBL reactor experiments for all flux models we have considered. The solid and dashed curves depict the 95% and 99% C.L. regions, respectively. As before, the HM fluxes are shown in orange; the ab initio fluxes are show in blue; and the HKSS fluxes are shown in dark cyan.   Using the ab initio fluxes decreases the global preference for a sterile neutrino relative to the HM fluxes: the evidence decreases from 2.5σ to 0.56σ. The reason for this, as we have mentioned, is that the overall flux normalization for the ab initio flux prediction for 235 U is ∼ 10% less than the corresponding HM prediction. Therefore, the ab initio expected IBD rates are less than the HM rates, so the evidence for a sterile neutrino is diminished. Moreover, recall that assigning the HM uncertainties to the ab initio flux predictions is optimistic; the true uncertainties associated with the calculation are assuredly larger than JHEP01(2021)167  the ∼ 2.5% that we have ascribed. Consequently, this flux model shows negligible preference for oscillations involving a sterile neutrino.

JHEP01(2021)167
On the other hand, using the HKSS fluxes increases the global preference for a sterile neutrino: the evidence increases slightly from 2.5σ to 2.6σ. The effect of including firstforbidden decays is to increase the expected flux in the region around E ν ∼ 5 − 6 MeV; the conversion process thus produces a higher overall antineutrino flux, implying a larger deficit than for the HM fluxes. Additionally, the inclusion of first-forbidden decays has not JHEP01(2021)167 dramatically increased the uncertainty budget of this calculation, as had been speculated may happen [62].
The observation of this diverging preference for a sterile neutrino is one of the central conclusions of this work. Clearly, at most one of these flux predictions is correct; determining which is paramount if the community wishes to determine how to best pursue this anomaly in the coming decade. It is critical to note that these predictions are ultimately derived from experimental measurements -the conversion method requires reliable uranium and plutonium fission β spectra, and the ab initio method requires precisely known cumulative fission yields and β decay strengths. Perhaps the most critical step to be taken in improving the antineutrino flux predictions is to improve the underlying data on which they are based.
It is important to stress that HKSS represents the first attempt to systematically include forbidden decays in flux predictions and there are a number of caveats one might raise with respect to the methods used. The shell model works best for spherical nuclei with near-magic numbers of nucleons, properties decidedly absent in many fission fragments, but there are examples where good agreement with experimental result could nonetheless be achieved [63]. The shape factors reported in ref. [18] are quite large but seem to fall within the range of observed shape factors [64]. In this context, it is worthwhile to observe that in ref. [62], it is a single operator, [Σ, r] 1− , corresponding to a large shape factor, that causes by far the largest effect on the neutrino flux. Only if this operator were to dominate a series of beta branches in fission fragments with large cumulative yields could a marked effect on the neutrino spectrum be found. In contrast, HKSS find that despite a mix of large shape factors being used, the impact on the overall flux is modest. This could be indicating that the concerns about forbidden decays in computing reactor neutrino fluxes may have been overstated. Temporal cut: experiments from the 2010s. Given the scarcity of experimental information that accompanies reports of older reactor experiments, one can reasonably ask how the evidence for the existence of a sterile neutrino changes if only recent experiments are included in our analysis. We introduce an ad hoc temporal cut on these reactor datafor concreteness, we only consider experiments from the 2010s, i.e., Nucifer, Double Chooz, Daya Bay and RENO -and derive the constraints shown in figure 23. As in previous figures, the 95% (99%) C.L. contour is shown in solid (dashed), and we simultaneously show results using the HM (orange), ab initio (blue) and HKSS (dark cyan) flux predictions. Relevant statistics are shown in table 4. Making this cut removes most SBL experiments, so the barycenters of the fits shift to smaller values of ∆m 2 41 . Imposing this cut increases the evidence for a sterile neutrino for all three flux models. For the HM and HKSS flux models, the evidence increases to 2.7σ and 2.8σ, constituting modestly strong evidence. On the other hand, the significance for the ab initio flux model rises to 0.81σ.

JHEP01(2021)167
To counteract the disproportionately high relevance of MBL experiments in this fit, we impose an additional ad hoc prior requiring ∆m 2 41 > 0.1 eV 2 and repeat the analysis. The resulting statistics are shown in table 5. The evidence is weakened by imposing this prior, but the same basic trend emerges: the HM and HKSS flux models prefer a sterile neutrino at 2σ, while the ab initio fluxes show negligible preference.

Alternate analysis: rescaling the HM fluxes
We consider an alternative to the sterile-neutrino hypothesis: that the data can be explained by simply rescaling the HM fluxes. In particular, we only consider the two dominant fissile isotopes -235 U and 239 Pu -and introduce a rescaling factor of each, respectively r 235 and r 239 . Similar analyses have been performed in refs. [28,56,58,60,[65][66][67]. We scan over the r 235 -r 239 plane in order to determine the extent to which the data prefer a rescaling of the HM fluxes and to determine if this is a more compelling explanation of reactor rate deficits than introducing a sterile neutrino. As in our sterile neutrino analyses, we treat systematics from 238 U and 241 Pu using nuisance parameters.  Table 6. Relevant statistics from our scans in the r 235 -r 239 plane, as described in the text.

JHEP01(2021)167
In figure 24, we show contours of constant ∆χ 2 in the r 235 -r 239 plane for different subsets of reactor experiments. The red regions refer to our analysis of SBL and MBL experiments that measure the total, time-integrated rate (namely, all of the above experiments except for Daya Bay and RENO); the purple regions refer to MBL experiments that measure the IBD rate as the effective fuel fractions change, i.e., Daya Bay and RENO; and the gray regions refer to a combined analysis of all experiments. Dark (light) shading represents the 95% (99%) C.L. region.
In table 6, we present the minimum value of the χ 2 (χ 2 min ), its value at the point r 235 = r 239 = 1 (χ 2 0 ) -indicated by the orange circle in figure 24 -and both the corresponding p-value and nσ at which the unscaled HM fluxes can be excluded for each dataset. We also show the 1σ (2σ) preferred region for the HM fluxes in dark (light) orange shading. While the fuel evolution dataset exhibits a moderate but clear preference (3.7σ), the integrated rate dataset presents 4.4σ evidence in favor of rescaling the HM flux predictions. Taken together, these data present 5.5σ evidence for this hypothesis, with the 235 U flux, again, being particularly suspect. On the other hand, we can directly compare how the ab initio and HKSS flux models compare to this rescaling of the HM fluxes. The blue and dark cyan circles, respectively, represent the relative sizes of the 235 U and 239 Pu isotopic IBD fluxes as compared to the HM prediction, calculated using our results from section 2. As for the HM fluxes, the appropriately colored regions represent the 1σ (dark) and 2σ (light) regions for these flux models. These three models generally agree on the flux from 239 Pu fissions, but disagree quite severely for 235 U, with the ab initio fluxes presenting a ∼6% deficit and the HKSS fluxes presenting a ∼1% enhancement with respect to HM. We quantify this in table 7, where we show how these alternate flux predictions compare to freely rescaling the 235 U and 239 Pu fluxes.

JHEP01(2021)167
Unsurprisingly, we find that the ab initio fluxes are consistent with the best-fit points from these analyses; the tension between them fails to surpass 1σ for any analysis we have performed. Conversely, the data strongly disfavor the HKSS fluxes: while the fuel evolution data exhibit 4.3σ tension, the integrated rate experiments present 5.3σ tension, and all experiments together present 6.5σ tension with this flux model. This again underscores the need to revisit the data that underpin these flux predictions.
An interesting subset of the rescaling hypothesis is to consider r 235 and r 239 being rescaled by the same factor. The line along which r 235 = r 239 is shown in black dashing in figure 24. In figure 25, we show the values of ∆χ 2 relative to the minimum value in the entire r 235 -r 239 plane as a function of r 235 = r 239 . Shown are the respective curves for integrated rate (red), fuel evolution (purple) and all experiments (black). The solid (dashed) gray, horizontal line is the 95% (99%) C.L. exclusion limit, calculated for two degrees of freedom; we present this as a subspace of the r 235 -r 239 plane of figure 24, and not necessarily as a separate hypothesis.
In table 8, we show the minimum value of ∆χ 2 along this one-dimensional subspace, ∆χ 2 1D , the corresponding p value and equivalent number of σ. The data are modestly consistent with a uniform rescaling of the 235 U and 239 Pu fluxes; a nonuniform rescaling of the 235 U and 239 Pu fluxes is preferred in an absolute sense, but not dramatically so. None of these analyses excludes a uniform rescaling of the HM fluxes by more than 1.1σ. Clearly, the data mildly prefer for the fluxes to be independently rescaled over introducing a sterile neutrino. This is unsurprising: the sterile neutrino hypothesis can be mapped onto rescaled-HM-fluxes hypothesis, assuming ∆m 2 41 is sufficiently large that its corresponding oscillations average out at all experiments. Because the former is a subset of the latter, it cannot be more strongly preferred. Introducing the ad hoc restriction that ∆m 2 41 5 eV 2 to ensure that oscillations would average out at all experiments, we find χ 2 min /d.o.f. = 32.3/39 and p = 0.77 (0.30σ). 9

Spectrum experiments
In this section, we discuss the experiments that enter into our spectral analyses. Unlike the rate experiments, these are all taken to be mutually uncorrelated. Given the increase in the complexity of each measurement -i.e., given that we are considering event spectra instead of integrated event rates -the systematics become much richer than in the previous section(s). Consequently, we discuss the statistical analysis of each experiment separately in what follows.
For these analyses, we exclusively employ the ab initio flux predictions. The reason for this is that, because we are interested in the ratios of spectra, one should expect that 9 We take the number of degrees of freedom to be 39 here. If ∆m 2 41 is large enough for oscillations to average out, then its value cannot be measured and it is not a relevant degree of freedom.

JHEP01(2021)167
the choice of flux model does not affect the results of such a study. One could just as easily use the HM or HKSS fluxes; the result are essentially indistinguishable.
We have limited our analyses to considering Bugey-3 [35], DANSS [68], Daya Bay [69], Double Chooz [52], NEOS [70] and RENO [60]. While the short-baseline reactor experiments PROSPECT and STEREO have produced early results, these are not yet competitive with other experiments in the best-fit region around ∆m 2 41 ∼ 1 eV 2 , so we do not include them at this time. Additional data releases are expected in the near future for both experiments, 10 as well as the SoLid experiment [72]; we look forward to including all of these in future versions of GLoBESfit.

Bugey-3
Implementation in GLoBES. Based on design specifications for a 900 MW e series reactor vessel described in ref. [73], we assume that the height of the active volume of the reactor core at Bugey-3 is 3.66 m tall with a radius of 1.5 m. We further assume that antineutrino production is uniform vertically 11 and azimuthally, but that the radial distribution is given by a zeroth-order cylindrical Bessel function whose first zero lies at the edge of the core.
We only consider the antineutrino spectra measured at the 15 m and 40 m positions. The spectral measurement at 95 m is sufficiently imprecise that we do not expect to gain appreciable sensitivity to a sterile neutrino through its inclusion; this is borne out in figure 16 of ref. [35]. Ref. [35] states that the detector modules used in the experiment are comprised of 98 separate 8. We precompute the oscillation probabilities at both positions using the geometries outlined above. The quantity of interest is the flux-averaged value of sin 2 (qL) at each experiment, where is independent of the experimental geometry. Note that this definition of q assumes that lengths are given in km, which is the expectation in GLoBES. We define this average to be F (q), and discuss it in more detail in the appendix (see eq. (A.8) and the surrounding discussion). We precompute F (q) in intervals of 0.01 in log 10 q over the range q ∈ [10 0 , 10 4 ].
The two positions are included with two separate AEDL experiments in GLoBESfit. We fix @time, @power, @norm and $target_mass to be 1.  [35] shows the ratio of the spectrum measured at 40 m relative to that measured at 15 m. We have digitized this figure and the depicted statistical uncertainties for use in our analyses; we reproduce these data in figure 38a in appendix C. In addition to these statistical uncertainties, the collaboration also claims a 2.0% systematic uncertainty on the relative normalization of the two spectra. We have also accounted for a 2.0% energy scale uncertainty for each position and have assumed that these are totally uncorrelated. These are included in our analyses via the GLoBES function glbShiftEnergyScale; see the GLoBES manual for a description of this function [19,20].
We utilize a chi-squared function of the following form: where S exp is the experimental spectrum and S pred is the ratio predicted using GLoBES, assuming the existence of a sterile neutrino with mixing angle sin 2 2θ ee and mass-squared splitting ∆m 2 41 . The covariance matrix V B includes statistical uncertainties and the 2.0% normalization uncertainty. The nuisance parameters ξ 15 and ξ 40 correspond to variations in the energy scale at 15 m and 40 m, respectively, with uncertainties of 2.0% apiece. In our fits, we minimize over these nuisance parameters for each value of sin 2 2θ ee and ∆m 2 41 . The results of our benchmark sterile neutrino analysis are shown in figure 26. The dark green, green and dark green contours represent the 95%, 99% and 99.9% C.L. contours, respectively. Relevant statistics are summarized in table 9. Bugey-3 shows a modest preference for a sterile neutrino; we find that the evidence rises to the level of 1.4σ.

DANSS
Implementation in GLoBES. The DANSS [68] reactor core is a 3.7-meter-tall cylinder with a diameter of 3.2 m. We assume production to be uniformly distributed, but that the radial distribution is given by the zeroth-order cylindrical Bessel function with zero at edge of core; we sample the vertical extent of the core following the burning profile in ref. [74]. The detector is a 1 m 3 cube located directly underneath the axis of the core; in our analyses, the centers of the core and the detector are taken to be either 10.7 m or 12.7 m apart, and we are ultimately interested in the ratio of the spectra obtained at these distances. We calculate the oscillation probabilities as described in appendix A. We calculate F (q) (see eq. (A.8)) on the range q ∈ [10 1 , These are the fuel fractions during the middle of a VVER-1000 reactor operating cycle as presented in table 1 of ref. [44].
Statistical analysis. Table 2 in ref. [68] tabulates the bottom/top spectral ratio, as well as the statistical errors on these ratios; we reproduce these data in figure 38b in appendix C. The DANSS collaboration has not published a thorough analysis of their systematic uncertainties, but the impact of these have been estimated in the final result. Specifically, the following systematics have been included: JHEP01(2021)167 • The energy scale uncertainty is taken to be 2.0%, and is fully correlated between the upper and lower configurations. This systematic is included via the GLoBES function glbShiftEnergyScale. A nuisance parameter is introduced corresponding to this energy scale shift.
• The uncertainty on the ratio of L −2 for the upper and lower positions is taken to be 2.0%. This introduces correlations between the spectral ratios.
The following chi-squared function is employed in these analyses: where S exp is the measured bottom/top ratio, S pred is the prediction of the same from GLoBES and V D is the covariance matrix describing these data. The nuisance parameter associated with the energy scale is ξ D and the corresponding uncertainty is σ D . In our analysis, we minimize over ξ D for each point in the sin 2 2θ ee -∆m 2 41 plane. The results of our analysis are shown in figure 27. The dark green, green and dark green contours represent the 95%, 99% and 99.9% C.L. contours, respectively. Relevant statistics are summarized in table 9. DANSS presents the most compelling single piece of evidence for a sterile neutrino of the experiments that we have considered -the evidence rises to the level of 3.0σ.

Daya Bay
Implementation in GLoBES. Daya Bay consists of an array of eight antineutrino detectors (ADs) and six nuclear reactors. Our analysis of the antineutrino spectrum at Daya

JHEP01(2021)167
Day is based on the 1958-day data release in ref. [69]. The data have been taken in three phases, corresponding to the active number of ADs: sequentially six (6AD; 217 days), eight (8AD; 1524 days) and seven (7AD; 217 days).
In their 1230-day data release [55], the collaboration reports the average power of each reactor core during the 6AD period and in the first 1013 days of the 8AD period; similar figures have not been published for the 1958-day data. We assume that (1) the average powers during the entire 8AD period are given by the average powers during the first 1013 days, and (2) that the average powers of during the 7AD period are equal to those from the 6AD period. We do not expect these assumptions to dramatically affect the outcome of this analysis, but we look forward to updating this in the future. The average powers during the 6AD and 8AD periods from the 1230-day data are tabulated in table 14.
The distances between pairs of reactors and detectors are tabulated in table I of ref. [55]; we tabulate these in table 13. We treat the reactor cores as being point-like, but the ADs are treated as 3-meter wide targets, the stated size of the acrylic vessel that contains the Gd-doped liquid scintillator [55]. The function F d (q) (see eq. (A.8)) for each detector d is written as follows: where r indexes the reactor cores, s indexes the three data-taking periods, t s d is the livetime of detector d during period s, P s r is the power of core r during period s and L dr is the distance between detector d and core r. The detector live-times are given as follows: • ADs 1-6 have t 6AD There are five factors that differentiate each detector: 1. For each detector, we define @time = 1.0 but, since not all detectors have been operational for the same amount of time, we fold these differences into the definition of @power. Namely, this is defined to be 3. The total efficiencies of the ADs also vary; these are given in the second row of table 15. We set @norm to be the stated value of ε tot for each detector.

JHEP01(2021)167
4. The effective baseline of each detector, L d , is defined via These are tabulated in the third row of table 15.
5. Lastly, the effective fuel composition visible to each detector varies; these are tabulated in the last four rows of table 15.
The energy resolution of each detector is taken to be identical. In the supplementary material to ref. [55], the Daya Bay collaboration publishes the response matrix that they use in their analysis. In principle, we could use the same response matrix here; to cut down on computation time, we parametrize the response using function of the form σ GeV = We are ultimately interested in the ratios of total events in EH2 (i.e., AD3 & AD8) and EH3 (AD4-7) to EH1 (AD1, 2), which we call EH2/EH1 and EH3/EH1. The spectrum observed at each EH is determined by simply summing the number of spectrum for each detector in the hall, accounting for the appropriate effective fuel fractions. Taking the ratios of these spectra, which we call S 12 pred and S 13 pred for EH2/EH1 and EH3/EH1, respectively, is then trivial.
We benefit from the extensive data release from the Daya Bay collaboration in the form of supplementary data to ref. [69]. In particular, the total numbers of observed IBD candidates (signal+background) for each bin in each experimental hall reside in the files DayaBay_IBDPromptSpectrum_EH<N>_1958days.txt, where <N> = 1, 2, 3; the background spectra have also been published and reside in the files DayaBay_BackgroundSpectrum_EH<N>_1958days.txt. We extract the observed ratios S 12 exp and S 13 exp ; we show these data in figures 39a and 39b, respectively. These are the basis of our chi-squared, defined as

JHEP01(2021)167
where S 12 and S 13 have been combined into one vector S and V DB is the covariance matrix, which accounts for both statistical and systematic errors including correlations. We describe our estimation of the covariance matrix V DB in what follows. The number of signal events in any of EH1, EH2 or EH3 (from data), which we call s A , is given by the difference between the total number of events t A and the expected number of background events b A ; here, A (= 1, 2, 3) indexes the experimental hall. The uncertainty on t A is statistical; the event rates are large enough where they can assumed to Gaussiandistributed. The uncertainty on b A , however, is considered a systematic uncertainty; we discuss it further below. Therefore, the statistical uncertainty on a given element of s A , which we call The correlation between S 12 i and S 13 i is also recorded. The collaboration provides a list of systematic uncertainties in table VIII of ref. [55]; in ref. [69], it was reported that the uncertainty on the 9 Li background has been reduced from 43% to 27% in EH1 and EH2. We have considered the contribution of each of these components, accounting for the stated level of correlations between detectors, reactors and other systematics, using Monte Carlo methods: the fractional change in the event rate is calculated by varying each systematic and the covariance matrix of the resulting pseudodata is recorded. In practice, we find that the largest contributions to the covariance matrix come from (1) variation in the backgrounds, and (2) varying the detector response. Unsurprisingly, the uncertainty on the HM flux predictions is found to be negligible -this is precisely the reason why the ratios of event rates have been used in this analysis.
As previously mentioned, the absence of specific information regarding the true average reactor powers and live-times for each detector during each operation period means that our predicted ratios of spectra will not agree with the true ratios, even in the absence of a sterile neutrino. To compensate for this, we introduce two calibration factors, one for each spectral ratio, which are ad hoc attempts to reproduce Daya Bay data; these are introduced via the replacements S 12 → ζ 12 S 12 and S 13 → ζ 13 S 13 . These factors are determined by minimizing the above chi-squared function assuming only three-neutrino oscillations with the best-fit three-neutrino parameters in ref. [46], and are found to be ζ 12 = 0.9933 and ζ 13 = 0.9973. A more accurate estimate of the sensitivity can be obtained using more accurate information regarding the cores and detectors.
The results of this analysis are shown in figure 28. The dark green, green and dark green contours represent the 95%, 99% and 99.9% C.L. contours, respectively. Relevant statistics are summarized in table 9. Clearly, the evidence for a fourth neutrino from Daya Bay spectra is rather weak.

Double Chooz
Implementation in GLoBES. Double Chooz consists of two monolithic, Gd-doped liquid scintillator detectors located in the vicinity of two commercial power reactors. The separations between the reactors and the detectors are taken from figure 1 of ref. [52]; the JHEP01(2021)167 distances are tabulated in table 23. Given the O(100 − 1000) m baselines, we safely assume that both reactors are point-like. However, the detector is taken to have a width of 3 m; this helps to ensure that fast oscillations average out in our calculations, and is consistent with the physical width of the detector.
The collaboration has not published the (relative) average powers of the two reactors during the data collection periods for either the near or far detector; we assume that the average powers are the same, and comment on this further below. We again calculate the quantity F (q) as in eq. (A.13), accounting for both reactors at each detector. This quantity is precomputed in steps of 0.01 in log 10 q; we calculate this for q ∈ [10 −3 , 10 1 ] for the near detector and q ∈ [10 −2.5 , 10 1 ] for the far detector.
Each detector is separately defined within GLoBES, using the information contained in table 23. We set @time to the operating time of the detector (in days), @power to 8.5 GW (assumed to be the same for the neat and far detectors) and @norm to be the efficiency. We set lengthtab to the corresponding value of 1/ L −2 , given by 0.4003 km and 1.052 km for the near and far detectors, respectively. The collaboration reports [52] that the near detector has 1.0042 ± 0.0010 times as many protons in its Gd-doped target as the far detector does, and 1.0045 ± 0.0067 times as many protons in its gamma catcher. Given the relative volumes of these two regions, we estimate that the near detector has 1.0044 times as many protons as the far detector. Consequently, we set $targetmass to be 1.0 for the far detector and 1.0044 for the near detector.
The energy spectrum in each detector is calculated in 26 bins between [1.

JHEP01(2021)167
Statistical analysis. Figure 4 of ref. [52] shows the measured ratio of spectra at the Double Chooz near and far detectors; we reproduce these data in figure 38c in appendix C. For our analyses, we have digitized these data and the corresponding statistical uncertainties. We also consider the following sources of systematic uncertainty: • Detection Uncertainties. These are enumerated in the third column of table 2 in ref. [52] and amount to a 0.475% uncertainty, assumed to be correlated between all energy bins.
• Flux Uncertainties. A 0.47% reactor-uncorrelated thermal power uncertainty has also been included. This results in a 0.4% uncertainty on the spectral ratio, assumed to be correlated between all energy bins.
• Backgrounds. Background rates at the near and far detector are given in table 3 of ref. [52] and the corresponding spectra are shown in figure 3 for both the near and far detectors. From these figures, we estimate the contribution of each background process to the background-subtracted event rate and the corresponding uncertainty using a simple Monte Carlo routine. This procedure yields the background-related uncertainty on the spectrum, including nontrivial correlations.
These are all included into a chi-squared function of the following form: where S exp and S pred are, respectively, the measured and predicted spectral ratios at Double Chooz. We note that we have reweighted the measured data by a factor i.e., by the ratio of the predicted numbers of events in either detector. The covariance matrix V DC contains the statistical and systematic uncertainties described above.
In the absence of published information regarding the relative powers of the two reactors over the operation of each detector, comparing S exp and S pred can be problematicit's unclear what the appropriate three-neutrino-oscillation baseline should be. To address this, we reweight our predicted spectrum by a factor ζ DC , determined as follows. We calculate S pred assuming only three-neutrino oscillations using the value of sin 2 θ 13 measured by Double Chooz in ref. [52], i.e, sin 2 2θ 13 = 0.105. We then replace S pred → ζ DC S pred and minimize over ζ DC ; we find ζ DC = 1.0026. We then include this factor in all subsequent calculations.
The results of our calculation are shown in figure 29. 13 The dark green, green and dark green contours represent the 95%, 99% and 99.9% C.L. contours, respectively. Relevant statistics are compiled in table 9. The evidence for a sterile neutrino at Double Chooz is weak -while the data show a clear deficit stemming from nonzero θ 13 , the data are sufficiently imprecise that any fine structure beyond the expected patter is washed out.

Implementation in GLoBES.
Details of the geometry of NEOS are taken from ref. [70]. The reactor core at unit 5 of the Hanbit Nuclear Power Complex, where NEOS is housed, is 3.1 m in diameter and 3.8 m in height. The detector is located in the plane 10.0 m beneath the center of reactor; labeling the center of core as the origin of our coordinate system and call the axis of the cylinder the z axis, the z coordinate of the detector is z = −10.0 m. The center-to-center distance is 23.7 m; the remaining coordinates of the center of the detector are taken to be x = +21.487 m and y = 0 m. The detector is 1.21-meter-long cylinder with diameter 1.03 m and is oriented so that its axis is in the xy plane, i.e., the axes of the detector and the core are orthogonal. Ref. [70] does not further specify the orientation of the detector axis in the xy plane, but the analysis is not particularly sensitive to this detail, given the size of the detector. For concreteness, we orient the detector along the x axis.
We estimate the distribution of baselines (more concretely, the distribution of 1/L 2 values between points in the core and the detector) using a simple Monte Carlo routine. The effective length to be L eff ≡ 1/ L −2 = 23.69 m. These distances are also used to calculate the function F (q) (eq. A.8) over the range q = 10 1 , 10 4 in steps of 0.01 in log 10 q.
NEOS, itself, requires one experiment file within GLoBESfit. The NEOS collaboration reports a prompt energy spectrum on [1.0, 10.0] MeV. We choose to ignore the last bin, covering [7.0, 10.0] MeV, in our analysis; there are very few events in this bin, so we do not expect to lose much information. The NEOS collaboration presents their data in terms of a ratio relative to the antineutrino spectrum measured at the Daya Bay near detectors in JHEP01(2021)167 ref. [54]; see figure 3(c) of ref. [70]. 14 The spectrum that we use to normalize the NEOS result is the sum of the antineutrino spectra in AD1 and, AD2, i.e., the detectors in EH1. The ratio of NEOS data relative to the Daya Bay spectrum is shown in figure 38d in appendix C.
Since we are again interested in a ratio of spectra, @time, @power, @norm and $target_mass are all fixed to be 1.0. We use the same parametrization of the energy resolution at NEOS as ref. [75]; however, since GLoBES assumes all energies are in GeV, we write the resolution as σ GeV = 0.00012 + 0.00158 E GeV . (5.10) Lastly, we fix $lengthtab = 0.02369 km, as described above. The spectrum is calculated for each of the four main fissile isotopes, and these are added together, weighted by the relevant fuel fractions [70,75]: Statistical analysis. The object in which we are ultimately interested is the double ratio of NEOS and Daya Bay spectra, to wit, where S A nν, i is the number of antineutrino events calculated using GLoBES for experiment A (= NEOS; DB, EH1) under the assumption of the existence of n neutrinos in bin i, defined using NEOS's 0.1-MeV binning. The justification for this is as follows.
As mentioned, the NEOS collaboration normalizes their spectrum relative to the Daya Bay measured flux. The latter has been inferred from the Daya Bay 1230-day data after unfolding three-neutrino oscillations. To properly account for the presence of a fourth neutrino in this denominator, one must first refold the best-fit three-neutrino oscillations before unfolding four-neutrino oscillations, with some assumptions about the new mass and mixing. This amounts to multiplying by the ratio of the expected numbers of events in a given bin at the Daya Bay near detectors with and without a sterile neutrino. Additionally, because the Daya Bay prediction has been rescaled to the NEOS data, there should also appear a factor of the ratio of oscillation probabilities at NEOS with and without oscillations. We are thus led to consider eq. (5.11), which has been previously considered in refs. [12,76].
We use EH1 from Daya Bay to provide the rescaling of the measured flux in eq. (5.11) in our calculations. The oscillation probabilities at NEOS are calculated assuming that only oscillations into the sterile state are relevant, but the full four-neutrino machinery is JHEP01(2021)167 applied to Daya Bay. The following chi-squared is used in order to study the extent to which NEOS can probe the existence of a sterile neutrino: where S exp are the NEOS data, S pred is the ratio calculated using GLoBES and V N is the covariance matrix of these data. As with DANSS, we introduce an explicit nuisance parameter ξ N for the energy scale uncertainty, whose uncertainty is nominally 0.5% [70]. This is studied using the GLoBES function glbShiftEnergyScale; in our fits, this parameter is minimized for every point in the sterile neutrino parameter space.
We turn now to the covariance matrix V N . There are two contributions that we include in this matrix. The first is the statistical uncertainties on the ratio S exp , which are simply read off of figure 3(c) in ref. [70]. The second is the covariance matrix published with the antineutrino spectrum in ref. [54]. Daya Bay publishes their results using a different binning than NEOS; we reformulate the covariance matrix using a simple Monte Carlo routine. Random antineutrino spectra are generated using the Daya Bay spectrum and covariance matrix, including correlations. The antineutrino spectrum that NEOS would see is determined by interpolating these generated spectra to the stated NEOS binning. The contents of these bins using the above-stated energy resolution, and the resulting covariance matrix is calculated.
Formally, we need to consider the predicted spectrum given the stated NEOS fuel fraction, not the Daya Bay fuel fraction from which this result is derived. This introduces model dependence on the HM fluxes -and with it, dependence on the uncertainties on these fluxes. These contributions have been included with a similar Monte Carlo routine, but their contribution to the final result is small [75].

JHEP01(2021)167
It bears mentioning that the use of the Daya Bay antineutrino spectrum to normalize the NEOS spectrum implies that these experiments are formally correlated with one another. However, we ignore the correlations between these experiments for the following reason. The Daya Bay antineutrino spectrum used at NEOS is based on the 1230-day data; in contrast, the Daya Bay analysis described in section 5.3 is based on the 1958-day data. Consequently, it's difficult to accurately assess the degree of correlation between these data sets. We hope to improve this aspect of our analysis with the collection and dissemination of more data.
The results of this analysis are shown in figure 30. The dark green, green and dark green contours represent the 95%, 99% and 99.9% C.L. contours, respectively. Relevant statistics are compiled in table 9. The sensitivity at low ∆m 2 41 ( 0.3 eV 2 ) is driven by the use of the Daya Bay spectrum for normalization; NEOS itself drives the sensitivity in the ∼ 1 eV 2 region. One could, in principle, develop an analysis based on the ratio of NEOS data to the HM flux prediction, the data for which are shown in figure 3(b) of ref. [54]; this is precisely what is done in, for instance, ref. [67]. The exclusions derived there only have support in the region around ∼ 1 eV 2 , but, as mentioned above, are more dependent on the uncertainties of the HM fluxes.

RENO
Implementation in GLoBES. The geometry of the RENO experiment [61] has been previously discussed in the context of our rate analysis. We continue to assume that each of the six reactors is point-like, but that both the near and far detector have a width of 3.0 m, to facilitate fast oscillations in averaging out. The distances between either detector and each of the six reactors are shown in table 18; the average powers of each reactor during the operating period of either detector are shown in table 19. These data have been obtained from private communications with the collaboration [59]. As with previous experiments, we precalculate F (q) (eq. (A.13)) on a grid with spacing 0.01 in log 10 q over the range q ∈ [10 −2.5 , 10 1 ] for both detectors.
The near and far detectors are defined as separate experiments within GLoBES. For each, we set @time to the operating time of each detector (in days), @power to the total thermal power from all six reactors and @norm to be the efficiency of each detector. The relevant quantities are tabulated in table 19. In the absence of any information to the contrary, we assume the detectors to be of equal mass; consequently, we set $target_mass to be 1.0 for each. We set $lengthtab equal to the appropriate value of Statistical analysis. Our analysis centers around the spectral ratio presented in the bottom panel of figure 2 of ref. [61]. These data and the corresponding statistical uncertainties have been digitized for use in our analyses; we reproduce these data in figure 38e in appendix C. Aside from these statistical uncertainties, we consider the following sources of systematic uncertainty: • Energy Scale. The collaboration claims a 0.15% uncertainty on the experimental energy scale [61]. We include this in our analyses via a nuisance parameter.
• Flux Uncertainty. A 0.9% uncertainty on the thermal power of each reactor has been included. Assuming that this is totally uncorrelated between cores, we estimate that this implies a ∼ 0.3% uncertainty on the far-to-near ratio.
• Backgrounds. Table 1 of ref. [61] gives the estimated total background rates at the near and far detectors, and the insets to figure 1 of the same reference show the background spectra. We have digitized these event spectra in order to estimate the contributions of the backgrounds to the spectral ratio via Monte Carlo. The resulting uncertainties and correlations are recorded for use in our analyses.
Once more, we employ a chi-squared function of the form where S exp and S pred are the experimental and predicted IBD spectra, respectively. We reweight the measured data by a factor of to ensure consistency in our GLoBES analysis. The covariance matrix V R includes the statistical and systematic uncertainties described above, neglecting the energy scale uncertainty, including correlations. The nuisance parameter ξ R describes adjustments to the energy scale at RENO, accounting for its uncertainty σ R = 0.15%. The effect of the energy scale shift is included JHEP01(2021)167 through the GLoBES function glbShiftEnergyScale. In our calculations, we minimize χ 2 RENO with respect to this nuisance parameter for every point in the sin 2 2θ ee -∆m 2 41 plane. In an attempt to faithfully reproduce this experiment in GLoBES, we reweight our predicted spectrum by an ad hoc factor ζ R , determined as follows. We have simulated the IBD spectra at RENO assuming the best-fit value of sin 2 2θ 13 determined in ref. [61], i.e., sin 2 2θ 13 = 0.0896. We then replace S pred → ζ R S pred , and minimize over ζ R . We find the minimum at ζ R = 1.0088. We then use this value of ζ R in all subsequent calculations.
The results of our calculation are shown in figure 31. 15 Relevant statistics are compiled in table 9. The evidence for a sterile neutrino from RENO is relatively weak.

Spectral analyses
In this section, we present and discuss our analyses of the experiments presented in the previous section. In particular, we consider several groupings of these experiments and assess the significance of the evidence for the existence of a sterile neutrino.

Aggregating results
In table 9, we combine relevant statistical quantities from our sterile-neutrino analyses of the experiments described in section 5. We also tabulate the locations of the best-fit points in the sin 2 2θ ee -∆m 2 41 plane for each experiment; these suppressed in figures 26-31 for clarity.

Combined sterile neutrino analysis
The exclusion contours from a combined analysis of these spectral measurements is shown in figure 32 Table 9. A summary of all results for spectral analyses. While introducing a sterile neutrino has significantly improved the fit, the threeneutrino hypothesis is not obviously insufficient -for 212 degrees of freedom, the above value of χ 2 3ν corresponds to p = 0.88. All of Bugey-3, Daya Bay, Double Chooz and RENO have values of χ 2 3ν that are less than the number of data points used in each analysis; simply put, these experiments do not convey a need to introduce a sterile neutrino. On the other hand, DANSS and NEOS ostensibly show a moderately strong preference for a sterile neutrino, as previously discussed (particularly the former); these experiments drive the global preference for a sterile neutrino, as has been previously observed [12,67,76]. The best-fit point in figure 32 is at (sin 2 2θ ee , ∆m 2 41 ) = (3.80 × 10 −2 , 1.26 eV 2 ), which is roughly consistent with the best-fit point from the spectral analyses in refs. [12,67].

Additional analyses
In this subsection, we discuss particularly interesting subsets of the spectral-ratio data and the extent to which these indicate the possible existence of a sterile neutrino.
Calibration: measuring sin 2 2θ 13 and ∆m 2 31 . We begin this section by discussing how we verify that our GLoBES analyses of Daya Bay, Double Chooz and RENO reproduce the experimental measurements of the three-neutrino parameters sin 2 2θ 13 and ∆m 2 31 in the absence of a sterile neutrino. We scan over these parameters and calculate the χ 2 precisely as described above; the results are shown in figure 33. The dark (light) regions represent the 95% (99%) C.L. contours derived for each of Daya Bay (red), Double Chooz (purple) and RENO (blue). The gray curves represent a combined analysis of all three of these.
These regions are broadly consistent with the best-fit values found in the analyses of the respective collaborations [52,61,69]. 16 This figure can also be compared against figure 5 of ref. [46], where a combined analysis of these medium-baseline experiments has also been performed. We again find general agreement between the calculated regions. The correspondence is not exact -for instance, the region preferred by Double Chooz is slightly wider here than the corresponding region in ref. [46] -but given that these 16 We elect to frame this analysis in terms of ∆m 2 31 instead of ∆m 2 ee . The precise relationship between these two quantities has been debated in the literature [77,78], but this is not relevant for our purposes here. fits have been performed with often incompletely reported information, we consider this a modest success.

JHEP01(2021)167
One may worry that introducing a sterile neutrino may cause a significant mismeasurement of sin 2 2θ 13 at reactor experiments. We have investigated how sensitivity to a sterile neutrino changes if sin 2 2θ 13 is unfixed from its best-fit value, and how introducing a sterile neutrino may shift the preferred value of sin 2 2θ 13 . 17 We find, however, that this is not the case. The measurement of sin 2 2θ 13 is driven primarily by Daya Bay (see figure 33), whereas the measurement of sin 2 2θ ee is driven primarily by DANSS, and that these measurements are largely uncoupled when performed simultaneously. This validates our fixing sin 2 2θ 13 in our analyses above.

DANSS and NEOS.
Given that DANSS and NEOS drive the sensitivity to a sterile neutrino in our global fit, it seems pertinent to ask how the combination of just these two experiments constrains the sin 2 2θ ee -∆m 2 41 plane. The resulting 95%, 99% and 99.9% C.L. contours are shown in dark green, green and light green, respectively, in figure 34. Relevant statistics are compiled in table 9 in the row headed "DANSS + NEOS." Restricting to these two experiments has resulted in p = 9.2 × 10 −4 , corresponding to 3.3σ. Notice, however, that the value of the chi-squared at the best-fit point, χ 2 min , is 84.9. For 82 degrees of freedom, this corresponds to p = 0.39; this is an acceptable fit, but hardly compelling evidence for the existence of a sterile neutrino. Moreover, the best-fit points from our analyses of DANSS and NEOS do not coincide, and the best-fit point from their combination lies close to that from DANSS alone. Consequently, we have investigated the 17 Given that accelerator neutrino experiments also provide a measurement of ∆m 2 31 , and that these measurements are consistent with the value measured at reactors, we leave this parameter fixed at its best-fit value for this study. compatibility of these data sets using a parameter goodness-of-fit test [79]. We define

JHEP01(2021)167
where χ 2 A is the minimum value of the chi-squared for experiment A. We find χ 2 PG = 6.4; assuming this is chi-squared distributed implies p = 4.0 × 10 −2 . These data sets seem to be in mild tension. While parameter goodness-of-fit values should be viewed with some skepticism (see, for instance, ref. [16]), it bears mentioning that the evidence for a sterile neutrino from this experiments, while modestly strong, is not entirely overwhelming.
Daya Bay and NEOS. Given the manner in which we have been forced to include NEOS -namely, as a ratio with respect to Daya Bay's EH1 -one could reasonably object that showing results from NEOS on its own has stripped this analysis of valuable context. To this end, we have performed a combined analysis of Daya Bay and NEOS, the results of which we show in figure 35. The resulting 95%, 99% and 99.9% C.L. contours are shown in dark green, green and light green, respectively, and relevant statistics are compiled in table 9 in the row headed "Daya Bay + NEOS." While the best-fit point to the NEOS data unto itself lies in the region around ∆m 2 41 ∼ 10 −1 eV 2 , this part of the parameter space is disfavored by the medium-baseline experiments, including Daya Bay. Combining Daya Bay and NEOS makes this apparent; while this part of the parameter space is still allowed, much more of the sterile neutrino parameter space is included in the at 95% C.L. region.
As mentioned in section 5, we have ignored possible correlations between Daya Bay and NEOS in our analysis because they are hard to quantify exactly. Consequently, this analysis is tantamount to adding together the individual χ 2 maps over the sterile neutrino parameter space. We note, however, that accounting for these correlations is one way in JHEP01(2021)167 which this analysis can be strengthened in the future; we look forward to improving this treatment in future versions of GLoBESfit.
Analyzing modern experiments. In our analyses of rate experiments, we considered how considering only experiments from the 2010s alters the evidence for a sterile neutrino. In the interest of parity, we perform a similar analysis here. For the experiments that we consider, this is tantamount to removing Bugey-3 from the fit.
We show the results of this analysis in figure 36. As before, the 95%, 99% and 99.9% C.L. contours are shown in dark green, green and light green, respectively. Relevant statistics are compiled in the last line of table 9. It is clear that removing Bugey-3 from the fit has not dramatically changed the best-fit region (cf. the dark green region in figure 32). In fact, the fit is slightly stronger for Bugey-3's omission -the preference for a sterile neutrino rises to 3.3σ which, while not much higher than 3.1σ, is an increase relative to the analysis of all experiments. The reason for this is clear from figure 26: bugey-3 indicates no preference for a sterile neutrino at ∆m 2 41 = 1.26 eV 2 . In fact, our analysis implies that the best-fit point from our analysis of modern experiments is less favored than zero mixing by the Bugey-3 data.
We emphasize that we know of no concrete reason why any experiment performed before 2010 (in this case, Bugey-3) should be disregarded in any of the fits we have performed. We mean merely to demonstrate that if one were suspicious of these data, for whatever reason, then their inclusion has not dramatically enhanced the preference for a sterile neutrino -the evidence is, in fact, largely driven by experiments from the past decade.
Energy scale uncertainties at Bugey-3, DANSS and NEOS. An important source of systematic uncertainty in these experiments is the energy scale of the detector. While

JHEP01(2021)167
the energy scale of liquid scintillator detectors is nonlinear, we assume in our analyses that the energy response is linear, and that the only uncertainty is on the slope of this energy response. This is precisely the meaning of the nuisance parameters ξ 15,40,D,N in section 5.
Loosely speaking, the uncertainty on the energy scale translates into uncertainty on the inferred value of ∆m 2 41 ; a systematic shift in reconstructed values of E ν induces a corresponding offset in the measured value of ∆m 2 41 . If the energy scales were changed at these experiments, then the fringe-like structures in the resulting confidence level contours would similarly be shifted up and down in figures 26-31.
As we have discussed, the evidence for a sterile neutrino is driven by the combination of DANSS and NEOS. This can be seen from figures 27 and 30 by looking at the fringes in the confidence level contours: the best-fit point resides in the region where the patterns of fringes line up and leave a gap in the sterile neutrino parameter space. One could then ask if the patterns of fringes can be moved relative to one another to produce an allowed region elsewhere. Conversely, we have also seen that Bugey-3 presents a mild challenge to the sterile-neutrino interpretation of modern spectral experiments. One could also ask if allowing the energy scale to vary at Bugey-3 allows for this tension to be mitigated.
Unsurprisingly, we find that shifting the energy scales within their stated error budgets does negligibly little to address either of these concerns -clearly, this would have already appeared in our analyses, wherein the energy scales are allowed to vary as nuisance parameters. One could, however, ask how increasing these uncertainties modifies our analyses. We have repeated these analyses with the energy scale uncertainties all increased to 20%. 18 Even here, we find that the resulting patterns of fringes have not appreciably moved in the ∆m 2 41 direction relative to the nominal analyses. Therefore, we are led to conclude that energy scale uncertainties do not contribute meaningfully to the preference for a sterile neutrino in these experiments.

Sterile neutrinos and the 5 MeV bump
The presence of an unexplained spectral feature at 5.0 MeV in the absolute prompt energy spectrum, independently observed by several experiments [22][23][24], has caused significant interest in recent years. In a sense, the so-called 5 MeV bump is a microcosm of the current situation regarding reactor antineutrino anomalies: there is a lingering suspicion that some unaccounted-for nuclear physics effect is operative, but precisely how this filters into the hunt for sterile neutrinos is unclear. In ref. [80], it was proposed that the bump could arise from the process 13 C(ν, ν n) 12 C in the presence of some new interaction. While this interaction could, in principle, explain the excess, it was found that concrete models of potential new physics that could provide such an interaction are already strongly constrained. This all suggests that some combination of nuclear physics and detector effects are a much more likely explanation of the bump [81].
Still, if the bump is indeed the result of a misunderstanding of the antineutrino flux, then this does not imply that a sterile neutrino does not exist. However, it is important

JHEP01(2021)167
to understand the effect that the bump has on searches for sterile neutrinos; this is the subject of this section. In the first subsection, we study a subset of the global dataset indicating the existence of the bump to determine how strong the evidence for this bump is. In the second, we present sterile neutrino analyses of the global reactor dataset with a phenomenological parametrization of the bump, and discuss the results.

The evidence for the bump
We begin by considering the most relevant subset of the evidence for the existence of the bump. In particular, we consider the following three measurements: 1. The measurements of the isotopic prompt energy spectra for 235 U and 239 Pu inferred from Daya Bay burn-up measurements in ref. [28]. We ignore the first three and the last two bins for each of these spectra, where experimental systematic uncertainties can be large. 3. The 235 U spectrum measured by PROSPECT in ref.
[29]. 19 We make extensive use of the supplementary material published by the Daya Bay and PROSPECT collaborations, as pertains to these measurements. For the RENO data, we have digitized figure 5 of ref. [60] for our analysis. More specifically, Daya Bay and PROSPECT have published covariance matrices for their measured spectra, including nontrivial correlations; for RENO, we assume that the data are uncorrelated in the absence of concrete information. We ask how well these data are described by the HM fluxes and determine the extent to which introducing a bump affects the quality of this fit. To these ends, we use the Markov Chain Monte Carlo (MCMC) package emcee [82] to perform two types of analyses. In the first, we allow only the magnitudes of the isotopic fluxes of 235 U and 239 Pu to varymultiplied by quantities r 235 and r 239 , respectively -while maintaining the shapes given by the HM fluxes. In the second, we add a separate Gaussian feature to each of these isotopic fluxes. The Gaussians themselves are normalized to unit area, though we multiply them by n 235 and n 239 , respectively; we simplify our analysis by fixing the means of these Gaussians to 5.8 MeV antineutrino energy (corresponding to 5.0 MeV prompt energy), and we assume they share a common width σ bump . We scan over the appropriate parameter spaces and determine the posterior probability densities for each case, and compare the maximum likelihoods.
We have considered two combinations of the data mentioned above. For the first, we simultaneously analyze Daya Bay and RENO; for the second, we add PROSPECT into the analysis. The PROSPECT spectrum of ref.
[29] has been presented with arbitrary normalization. To account for this, we introduce a free-floating parameter η to rescale JHEP01(2021)167   the predicted 235 U spectrum at PROSPECT, independent of r 235 above. table 10 shows relevant statistics from our analysis performed in the absence of a bump. We present the maximum value of the likelihood L in terms of the minimum value of −2 log L. Shown also are the number of degrees of freedom n dof , and the p-value derived assuming (−2 log L) min is Gaussian-distributed. Moreover, table 11 shows the same quantities for our analyses in which bumps in the isotopic fluxes are introduced. Clearly, the current data greatly prefer this sort of bump over the pure HM fluxes.
More of the structure of the fits in which a bump is invoked is shown in figures 40 and 41 in appendix C for our analyses without and with PROSPECT, respectively. In the two-dimensional planes, the blue, orange and red contours represent the 68.3%, 95% and 99% credible regions (C.R.), respectively. Additionally, the one-dimensional plots also show the 90% C.R. curve in green. Firstly, both analyses prefer the HM fluxes to be rescaled downward in magnitude, particularly for r 235 . Secondly, while the data are largely consistent with the absence of a bump in 239 Pu, nonzero n 235 is preferred at 7σ -the data unequivocally prefer a spectral distortion.
The fit to Daya Bay and RENO in the presence of a bump is fairly high quality (p = 0.70) but degrades quite steeply (p = 2.9 × 10 −2 ) when PROSPECT is introduced. The reason is straightforward. The Daya Bay and RENO analyses prefer for the absolute IBD yields from 235 U to be scaled down by ∼ 10%. 20 This primarily stems from the deficit of events observed at low energies at Daya Bay, whereas the spectrum above E ν ∼ 5 MeV is largely consistent with the HM prediction (see figure 1a). On the other hand, the PROSPECT spectrum is reasonably consistent with the shape of the HM prediction for the 235 U flux, to within modest-sized uncertainties, over the entire energy range, particularly in the low-energy region. Clearly, the fit cannot successfully reconcile these differing lowenergy behaviors. The issue of the reactor bump may be a commentary on the low-energy spectrum as much as the high-energy spectrum.

JHEP01(2021)167
For context, the PROSPECT data, unto themselves, yield (−2 log L) min ≈ 54; for n dof = 29, this corresponds to p = 3.2 × 10 −2 . It bears mentioning that PROSPECT is still collecting data -the discrepancy between the PROSPECT and Daya Bay measured 235 U spectra may well be resolved with increased statistics and improved analysis techniques.

Sterile neutrino analyses with the bump
We turn now to the issue of how the inclusion of a bump in the isotopic flux of 235 U alters the evidence for the existence of a sterile neutrino. We will consider both rate and spectral analyses in turn.
We begin by considering the effect of the 5 MeV bump on our rate analyses in sections 3 and 4. Antineutrinos in the range [5.3, 6.3] MeV are predicted to constitute 13.5% of IBD events at these experiments. If the 235 U flux is augmented with a Gaussian bump of the sort indicated by our Daya Bay/RENO/PROSPECT analysis, i.e, a Gaussian bump with height n 235 = 5.915 × 10 −3 (cf. figure 41), then this factor becomes 14.4% -a 0.9% increase. Moreover, since many of the experiments considered here have fission fractions of the order f 235 ≈ 0.5, we see that adding this sort of Gaussian bump increases the predicted IBD rate by 0.5%. This will not be enough to explain the entire deficit with respect to the HM prediction, as we will see.
On the other hand, we have the measurements of ratios of spectra in sections 5 and 6. As argued previously, we expect these measurements to be largely insensitive to the details of the antineutrinos flux. Therefore, we do not expect the 5 MeV bump to significantly affect searches for sterile neutrinos in this channel, either.
We have confirmed this suspicion by calculating the distribution of χ 2 over the sin 2 2θ ee -∆m 2 41 plane -precisely as has been done previously -with the additional contribution from a Gaussian bump in 235 U. We vary the height of the bump between 0.0 and 10 −2 in the units of figures 40 and 41, and have determined that the minimum value of the χ 2 changes by less than one unit over this range. Specifically, we find χ 2 min (n 235 = 0) = 174.8 21 and χ 2 min (n 235 = 10 −2 ) = 175.5. The bulk of this difference is attributable to NEOS, given their unconventional approach to normalizing their spectrum. Moreover, we find that the statistical significance, in terms of nσ, does not vary meaningfully from 3.1σ.
We are led to conclude that the presence of the 5 MeV bump does not substantively impact searches for sterile neutrinos at reactor experiments, because (1) the contribution of events contained within the bump is at the sub-percent level, and (2) spectral ratios are, by design, largely insensitive to the particulars of the flux model employed.

Regarding Neutrino-4
Neutrino-4 [83,84] is a liquid-scintillator antineutrino detector located at the SM-3 reactor in Dimitrovgrad, Russia. The moveable detector covers a range of distances -between 6 and 12 m -from a 100 MW core. Recently, the collaboration has reported [83] results that JHEP01(2021)167 Figure 37. A comparison between our spectral results, recent results from Neutrino-4 [84] and a global analysis of tritium-endpoint experiments [85]. We show our 95%, 99% and 99.9% C.L. contours in dark green, green and light green; the 1σ, 2σ and 3σ C.L. contours from Neutrino-4 in dark blue, blue and light blue; and the 90% and 99% C.L. contours for tritium experiments in solid and dashed black. imply 3.5σ evidence for the existence of a sterile neutrino; preliminary results presented in December 2019 [84] suggest that this feature persists with the collection of more data.
The data currently available to the public are insufficient to allow us to attempt to replicate the collaboration's analysis. However, we can get a sense for how compatible Neutrino-4 is with the other experiments that we have considered by overlaying their respective allowed contours. We show precisely this in figure 37. The dark green, green and light green curves are, respectively the 95%, 99% and 99.9% C.L. contours from our combined analysis of measured spectral ratios from Bugey-3, DANSS, Daya Bay, Double Chooz, NEOS and RENO. The dark blue, blue and light blue curves are the 1σ, 2σ and 3σ allowed contours shown on slide 30 of ref. [84]; we caution that these results are preliminary, but proceed regardless. We also show the 90% (solid black) and 99% (dashed black) C.L. contours from the combined analysis of the Mainz, Troitsk and KATRIN tritium-endpoint experiments presented in ref. [85], where a similar comparison has been performed on the basis of Neutrino-4 results from ref. [83].
As observed in ref. [85], the best-fit point from Neutrino-4 (sin 2 2θ ee = 0.31, ∆m 2 41 = 7.32 eV 2 ) is disfavored, to a moderate degree, by the tritium-endpoint experiments. Similarly, the best-fit point to Neutrino-4 data is inconsistent with the region preferred by reactor spectral ratios. Moreover, while not shown in figure 37, the Neutrino-4 best-fit point is also disfavored by the integrated rate measurements presented in section 6 -in particular, see figure 22. The degree of (in)compatibility has been tabulated in table 12.
For each of our three rate analyses and for our spectral analysis, we show the difference in χ 2 between our best-fit point and that of Neutrino-4 (∆χ 2 N4 ), as well as the corresponding p-value and the equivalent number of σ.  Table 12. The (in)compatibility between the best-fit point from the Neutrino-4 analysis presented in ref. [84] and the analyses we've presented here.

JHEP01(2021)167
Clearly, none of the four analyses are particularly tolerant of this result from Neutrino-4. Our spectral analysis disfavors the best-fit point at the 3.3σ level, but all three rate analyses disfavor this point at 4σ. It is perhaps imprudent for us to focus solely on the best-fit point from Neutrino-4: significant portions of the 3σ-preferred parameter space cannot be firmly refuted by the tritium-endpoint experiments or by our analyses. However, in the absence of enough information to perform our own analysis of Neutrino-4, it is difficult to precisely discuss the global compatibility of these results. Interestingly, the updated analysis of ref. [84] contains a 3σ-preferred region around ∆m 2 41 ∼ 1 eV 2 that was not present in ref. [83]. If confirmed in an official publication, then this would be broadly consistent with the best-fit region from our analysis of spectral ratios. It is our hope that an extensive data release will allow us to study Neutrino-4 in more depth in the future.

Conclusions
In this work, we have studied the evidence for the existence of sterile neutrinos contained in the global reactor antineutrino dataset using a new open-source, GLoBES-based toolkit that we call GLoBESfit. We have demonstrated that inferences from IBD rate measurements are confounded by the necessary dependence on a particular flux model. In particular, while the evidence obtained using the traditional HM fluxes is at the level of 2.5σ, updated ab initio fluxes cannot meaningfully distinguish between three-and four-neutrino oscillations, while updated spectral-conversion fluxes yield 2.6σ significance. This latter figure is particularly noteworthy as one of the key findings of this work. As previously discussed, there have been claims in the literature that including forbidden decays in the reactor antineutrino spectrum would cause the error budget to grow to the point that any potential indication of a sterile would be washed out. However, this is clearly not the case; while the predictions do become slightly more uncertain, the increase in the overall predicted fluxes more than compensate for this. The effect is the increase in the evidence in favor of a sterile neutrino we have presented here, in stark contrast to the indifference exhibited by the ab initio fluxes. These diverging preferences can only be resolved with continued improvements to predictions of reactor antineutrino fluxes, which in turn rely on improved data.
Ratios of measured IBD spectral, on the other hand, seem to paint a more optimistic picture: taken together, the current dataset suggests that the evidence rises to the level of 3.2σ. This is certainly tantalizing, but hardly ironclad. We underscore that the statistical techniques we have applied to these spectral measurements are oversimplified, to the effect JHEP01(2021)167 that they will produce overconfident results. This will be discuss in more detail below. Still, we are led to conclude that the issue of the existence of additional species of neutrinos cannot be resolved one way or another with current data. We have also addressed the 5 MeV bump, and have found that the presence of this feature does not dramatically alter inferences about sterile neutrinos.
In addition to presenting our results, this manuscript is also intended to document GLoBESfit. This software is available for download from either GitHub [86] or from our dedicated website [87]. The function of making GLoBESfit publicly available is twofold: 1. Publishing our code allows for members of the community to levy informed commentary on our techniques. Global fitting in an inherently tricky business; reproducing an experiment's results can be challenging for those not involved in the collaboration.
Invariably, there are aspects of this program that can be improved. Our hope is that public input, over time, will reinforce this work.
2. Though we intend to revisit and expand on this work in the future, it is not possible for us to test every interesting physical hypothesis that could explain current neutrino anomalies. We have focused on the 3+1 scenario because of its simplicity, but Nature could certainly be far more interesting than this. If a user is interested in testing a particular model, then the tools we have developed ought to allow them to at least start that process. Because GLoBESfit is developed using pre-existing GLoBES architecture, it should lend itself to much a wider set of applications than the relatively simple scenario we have considered here.
Our hope is that making these tools available contributes meaningfully to the wider neutrino physics community.
Regarding the use of statistics. A common sin in global analyses is assuming that ∆χ 2 is chi-squared distributed. This issue has been discussed in detail in the literature; see, for instance, refs. [88][89][90]. Analyses of spectral ratios are particularly susceptible to this statistical idiosyncrasy, as is demonstrated nicely in figure 5 of ref. [89]; we direct the interested reader to this work for more details. The primary issue is that while the expected ratio of spectra is flat in the absence of a sterile neutrino, statistical noise invariably results in a preference for oscillations with some nonzero amplitude. Failing to take this into account yields final results that are overconfident. In this work, we have been conservative in talking about the statistical significance of our findings, particularly those in section 6. While the significance we have presented here ostensibly rises to the ∼ 3σ level, the true significance is certainly less than this. The actual preference for a sterile neutrino in the global reactor dataset, accounting for this effect, remains to be determined. We relegate a detailed study of these effects to future work, but offer some commentary here. The difficulty is that the precise relationship between ∆χ 2 and a p-value depends strongly on the experiment(s) under consideration. How ∆χ 2 is distributed -and thus the p-value associated with a particular measurement -can be determined numerically as a function of sin 2 2θ ee and ∆m 2 41 using Bayesian methods; this is the basis of the Feldman-Cousins technique [88]. However, this requires the ability to reliably simulate the ex-JHEP01(2021)167 periment(s) under consideration; the capacity to do so is limited for those not involved in a given collaboration. Even with perfect simulation capabilities, however, this sort of analysis can be prohibitively computationally demanding, particularly if considering multiple experiments simultaneously. We hope to address these concerns in future releases of GLoBESfit, but for the time being, we are left wanting for a more sophisticated statistical treatment. 22 While we believe that the results we have presented here are merited in their own right, our hope is that presenting the results as we have -and acknowledging the ways in which the underlying treatment is deficient -is sufficient to promote dialogue between experimentalists and theorists on how to best employ the available data. It is incumbent on theorists to ensure that experimental data is used correctly in their analyses; it is incumbent on experimentalists to follow good data preservation practices. Decisions for how to best commit the neutrino community's finite resources ultimately rely on understanding what is (and is not) contained in current data.

A.1 Oscillations with a sterile neutrino
The generic expression for the vacuum oscillation probability P αβ -the probability for a neutrino produced in the flavor eigenstate ν α to be observed in the flavor eigenstate ν β (α, β = e, µ, τ, s) -is given as follows: where U αi (α = e, µ, τ, s; i = 1, 2, 3, 4) are the elements of the leptonic mixing matrix, L is the distance propagated (i.e., the baseline), E ν is the neutrino energy and we define ∆m 2 ij ≡ m 2 i − m 2 j , with m i the neutrino masses. For oscillations involving antineutrinos, we make the replacement U → U * ; this is equivalent to changing the sign of the last term in eq. (A.1).
There are only four relevant elements of the leptonic mixing matrix, for our purposes here. We parametrize them as follows: one can easily verify the unitarity constraint 4 i=1 |U ei | 2 = 1. In our analyses, we have assumed that the values of θ 12 and θ 13 determined from global fits to oscillation data assuming three-neutrino oscillations [46] can be applied without alteration. Of course, what experiments actually measure is (some combinations of) the elements of the leptonic mixing matrix. Consequently, a determination of, say, |U e2 | 2 is absolute, whereas the inferred value of θ 12 will depend on values of θ 13 and θ 14 inferred elsewhere. However, as long as θ 14 is small -which we find to be the case in our fits -the implied modifications to the three-neutrino mixing angles are also small. When (anti)neutrinos propagate through matter, the background potential from ambient protons, neutrons and electrons modifies their propagation. The effects of this matter potential can be included via modifications to the mixing angles and mass-squared differences. Moreover, GLoBES does contain the functionality to calculate neutrino propagation in a matter background. Given the low energies under consideration here, however, we assume that the vacuum-oscillation formalism is sufficient. electron-type survival probability, P ee : where sin 2 2θ ee = 4|U e4 | 2 (1 − |U e4 | 2 ) = sin 2 2θ 14 is the effective active-sterile mixing angle. We work in terms of sin 2 2θ ee even though this is equal to sin 2 2θ 14 in our parametrization; the effective angle sin 2 2θ ee is parameterization independent and thus more appropriate for comparison with the broader literature. For medium-baseline experiments, this two-flavor formalism is insufficient -we consider the full, four-flavor formalism. The electron-type survival probability is instead written as this can be immediately derived from eq. (A.1) by setting α = β = e.
For the experiments we consider, we often cannot consider oscillations with one welldefined baseline: either (1) the experiment is sourced by multiple reactors, or (2) the physical extent of the core or detector (or both) is not much smaller than the distance between them. In either case, one must generalize the term sin 2 ∆m 2 L 4Eν to a flux-weighted average over baselines. The appropriate quantity is where we define q ≡ ∆m 2 4Eν and introduce f (L) as the distribution of baselines in the experiment. The denominator is the mean-inverse-squared baseline, which we denote L −2 . We discuss our implementation of these integrals below.

A.2 Implementation in GLoBESfit
The survival probability P ee is calculated in GLoBES with custom oscillation engines using the function glbDefineOscEngine, which is called in the main program file. A sample call of this has the following form: glbDefineOscEngine(NUMP, &<exp>_probability_matrix, &glf_get_oscillation_parameters, &glf_set_oscillation_parameters, "<engine_name>", user_data); We explain the arguments of this function below. NUMP: the maximum number of oscillation parameters used. For four-neutrino oscillations, this number should be 12; practically, we take it to be 6. <exp>_probability_matrix: this is the function in which the oscillation probabilities are calculated for experiment <exp>. This function produces a three-by-three matrix P whose elements are the active-active oscillation probabilities (i.e., P[0][0] corresponds JHEP01(2021)167 to P ee , P[0] [1] to P eµ , etc.). Currently, these functions only calculate P ee ; all other probabilities are fixed to 0. We discuss the evaluation of the probabilities in the next subsection.
glf_get_oscillation_parameters: this function retrieves the current values of the oscillation parameters.
glf_set_oscillation_parameters: this function sets the values of the oscillation parameters.
"<engine_name>": this is the internal name of the oscillation engine, which is called in the corresponding AEDL file via $oscillation_engine="<engine_name>".
user_data: a void pointer to user-specified data that are relevant for the evaluation of the probability. We discuss the use of this apparatus below.

A.3 Custom probability matrices
For most of the experiments we consider, it is sufficient to take the core to be point-like but to give the detector finite extent. We usually make the approximation that the detector constitutes a (portion of a) spherical shell centered around the core. The function f (L) is then flat over the extent of the detector, which we consider to be L a < L < L b ; this leads to a relatively simple expression for L −2 , as well as the function F (q) in eq. (A.8): where Si(x) = x 0 sin t t dt is the sine integral. For each experiment of this sort, the width of the detector is specified in GLoBESfit as the sole element of an array typically named <exp>_wide. This is passed into the probability engine by setting "user_data" to be "(void *) <exp>_wide" in glbDefineOscEngine, above. The function in eq. (A.10) is evaluated, using this input, with a helper function called lsin(q, La, Lb). Here, La and Lb are the nearest and further baseline at the experiment, determined from the width and the (average) baseline thereof. Note that all lengths in GLoBES are assumed to be in km. Moreover, since GLoBES also assumes all energies are in GeV, q in this function is numerically given by Occasionally, the experimental geometry results in a more complicated f (L) whose structure we must take into account. This distribution is estimated using a simple Monte Carlo routine. Pairs of points are randomly drawn from the core and detector, whose geom-JHEP01(2021)167 etry we must specify by hand. One can then numerically integrate over this distribution to determine L −2 and F (q); this numerical integral is performed as a sum over these pairs.
To save computation time, F (q) is precomputed on a grid and is fed into GLoBESfit as a look-up table, whose elements are of the form { log_10 q, F(q)}, that resides in auxiliary header files. Linear interpolation in log_10 q is used to get intermediate values.
If q is less than some minimum specified value q min , then it is assumed that the integral scales quadratically and F (q) is determined by extrapolation. If q is instead greater than some maximum specified value q max , then it is assumed that F (q) has averaged out to 0.5. The values of q min and q max depend on the experiment under consideration. We adopt this method for Daya Bay and RENO for both the rate and spectral analyses; moreover, we do this for Bugey-3, 23 DANSS, Double Chooz and NEOS for our spectral analysis.
The precomputed F (q) is fed into the custom oscillation engine via the "user_data" option. This is handled by a custom data structure called "glf_distance_data", whose elements are the length of the precomputed array and a pointer to the array itself. The structure for experiment <exp> is named "<exp>_s"; this is handled by setting "user_data" to be "(void *) &<exp>_s" in the definition of the oscillation engine.
For medium-baseline experiments, each detector and reactor can be considered pointlike, but these experiments are sourced by multiple reactors. The reactors generically have different average powers over the duration of the experiment; these weight their contributions. The integrals over baselines in L −2 and F (q) are replaced by sums: where r indexes the reactors, of which there are N r , and the average power of each is P r . For rate-only analyses, the multiple reactors at Chooz, Double Chooz and Palo Verde are handled using different AEDL experiment definitions; the resulting event rates are added at the analysis level. Meanwhile, for rate-only analyses of Daya Bay and RENO, as well as the spectral analyses at Daya Bay, Double Chooz and RENO, the contributions of multiple reactors are combined at the level of the AEDL file. This is purely a function of the development history of GLoBESfit and does not reflect differing physics in these cases.

B Structure of the code
In this appendix, we document the component files of GLoBESfit and briefly describe the function and contents of each.

B.1 Common files
We begin with files that are common to both the rate and spectral modules.
• Medium-baseline, fuel-evolution experiments are contained in rate_combo3.glb: These numberings correspond to the internal GLoBES ordering. Note that some AEDL files apply to more than one experiment. The ordering of these experiments is a function of the development history of GLoBESfit and is in no way a commentary of any of these experiments.

Daya
We define a GLoBES rule for each of the four main, fissile isotopes for each experiment. This allows the user to separate out contribution of each isotope in whatever calculation they are interested in. Moreover, it cuts down on the number of flux files that one must keep on hand; instead of defining a separate flux file for each experiment, one need only combine the fluxes from each isotope with the appropriate fuel fraction. Moreover, the theoretical uncertainties on the isotopic IBD yields for 235 U, 238 U, 239 Pu and 241 Pu are declared in the AEDL definition for Bugey-4/Bugey-3 (15 m) in the field @sys_on_errors. We have also made clones of this file that modify how these systematics are treated; see below.
Because the absolute numbers of events are not relevant for our analyses, @time, @power, @norm and $target_mass are all generally set to 1.0 in the AEDL definitions;

JHEP01(2021)167
we comment below on instances in which we deviate from this. The fluxes from each of the four main fissile isotopes for each experiment are considered separately and reweighted by the effective fuel fractions given by each experiment. Consequently, @time, @power and @norm are set to be the same for each component of the flux. The value of $length_tab is set to 1/ L −2 for each experiment; Exceptions to this occur for medium-baseline experiments -namely, Palo Verde, Double Chooz, Chooz, Daya Bay and RENO -which are typically sourced by multiple reactors, or are comprised of multiple detectors, whose relative contributions must be taken account.
• For Palo Verde and Chooz, @norm is set equal to the numerical value of the exposure, in GW·yr, for each detector, which we have calculated in section 3.
• We do something similar for Daya Bay. We set @power equal to the total power delivered to each detector, namely where r indexes the eight reactor cores and s indexes the periods of operation; the Daya Bay rate-evolution analysis is based on the 1230-day data set from ref. [56], so we only consider the 6AD and 8 AD periods. We set @norm to be equal to each detectors total efficiency, and $target_mass to be equal to each detector's mass in tons.
• Since the relative contributions of the reactors used in Double Chooz near detector have not been specified, we assume them to be equal. This allows us to set @norm to 1.0 for each.
rate_combo_no_sys.glb: similar to rate_combo.glb, except no systematic uncertainties are associated with the HM flux predictions. In this mode, GLoBESfit recycles the AEDL definitions in rate_combo2.glb and rate_combo3.glb. rate_combo_unfix.glb: similar to rate_combo.glb, except systematic uncertainties are only applied to 238 U and 241 Pu. The IBD yields from 235 U and 239 Pu are left unfixed to allow them to be scanned in our analysis. Note that in this mode, GLoBESfit recycles the AEDL definitions in rate_combo2.glb and rate_combo3.glb. rate_combo_SM{2,3}.glb: similar to rate_combo{2,3}.glb, except that the SM fluxes are used in lieu of the HM fluxes. rate_combo_HKSS{2,3}.glb: similar to rate_combo{2,3}.glb, except that the HKSS fluxes are used in lieu of the HM fluxes.
glf_rate_aux.h: experimental inputs for our rate analyses. These include the fuel fractions and experimental IBD rates at each experiment, as well as the (inverses of the) covariance matrices that are used in the determination(s) of the chi-squared function(s). This is called in glf_rate_chi.c.
glf_rate_chi.c: the file containing the chi-squared functions to be used with the GLoBES files described above. There are five such functions -glf_rate_chi_nosys, The indices used here correspond to the internal GLoBES indices for each experiment. As with rate experiments, we define separate GLoBES rules for each of the four main fissile isotopes.
glf_spectrum_aux_1.h: covariance matrices used in our spectral analyses. This is called in glf_spectrum_chi.c.
glf_spectrum_aux_2.h: experimental data for our spectral analyses, including the relevant measured spectral ratios and the stated fuel fractions. This is called in glf_spectrum_chi.c.
glf_spectrum_chi.c: the file containing the definition of the chi-squared function(s) detailed in section 5.
glf_spectrum_chi.h: the header file corresponding to glf_spectrum_chi.c. This is called by glf_spectrum.c.
glf_spectrum.c: the main file that gets run by the executable ./glf_spectrum. The structure of this file is similar to main_rate.c, above.

B.4 Executables and options
./glf_rate: the main executable for the rate-only aspect of GLoBESfit.
./glf_spectrum: the main executable for the spectrum aspect of GLoBESfit. We summarize particularly useful command-line options common to either executable.
• -bN: allows the user to turn off a particular block of experiments corresponding to the index N. (We call these "blocks" because "experiment" can be ambiguous in the parlance of GLoBES. Removing Daya Bay from the calculation, for instance, requires that we ignore several experiments in the AEDL file.) We comment on these blocks below.
• -rN: sets the number of points used in each direction in the parameter scan to be N. By default, N = 51. . If left blank, then the output is set to stdout, i.e., it writes to screen.
Additionally, there are some options that are specific to the rate-only analysis.
• -S: turns on systematics stemming from the HM flux predictions. This changes the chi-squared function from glf_rate_chi_nosys to glf_rate_chi.
• -M: replaces the HM fluxes with the SM fluxes, using the systematic uncertainties of the former. This changes the chi-squared function from glf_rate_chi_nosys to glf_rate_chi_SM.
• -H: replaces the HM fluxes with the HKSS fluxes, including their own uncertainties. This changes the chi-squared function from glf_rate_chi_nosys to glf_rate_chi_HKSS.
• -u: turns on systematics stemming from 238 U and 241 Pu and scans over the normalizations of the 235 U and 239 Pu fluxes instead of considering a sterile neutrino. This changes the chi-squared function from glf_rate_chi_nosys to glf_rate_chi_unfix.
Note that these options are mutually exclusive and have been listed in order of priorityfor instance, entering "-M -S -H" at the command line will be treated as if only "-S" has been entered. There is one further rate-specific option: • -w: in addition to the values of sin 2 2θ ee , ∆m 2 41 and χ 2 , this option toggles writing the best-fit values of the four nuisance parameters ξ 235 , ξ 238 , ξ 239 and ξ 241 to disk, in that order. This only works in conjunction with -S, -M and -H.
Finally, we list some options that are specific to the spectral-ratio analysis.
• -T: in addition to scanning over sin 2 2θ ee and ∆m 2 41 , this option triggers an additional scan over sin 2 2θ 13 on the range [0.0, 0.2].
• -C: this option eschews the scan over the sterile neutrino parameter space altogether and instead scans over sin 2 2θ 13 on the range [0.05, 0.2] and ∆m 2 31 on the range [1.0, 4.0] × 10 −3 eV 2 . (This option has been used to perform the calibration scan whose results are shown in figure 33.) Experimental blocks. Currently, glf_rate is not set up to isolate every individual experiment. However, given that the covariance matrix decomposes into blocks, this allows us to trivially remove blocks of experiments from the analysis. Most of these blocks consist of only one experiment, while the rest include multiple experiments. Here, we tabulate which experiments correspond to which of these blocks.   Table 15. Relevant experimental data for Daya Bay. The first row shows the total fiducial target mass of the detector in metric tons [55]. The second row shows the total efficiency ε tot for each detector [54]. The third row shows the effective baseline of each detector in meters, calculated using the distances in table 13 and the powers in table 14. The last four rows are the average effective fuel fraction at each detector at Daya Bay, assumed to be the same for the 6AD and 8AD period [54].

C.2 Experimental measurements
We show plots of the ratios of spectra measured at Bugey-3, DANSS, Daya Bay, Double Chooz, NEOS and RENO, which underpin our analyses in sections 5 and 6, in figures 38 and 39. The black points and error bars represent the data and their statistical uncertainties. The colored curves represent the expectations assuming three-neutrino oscillations; when relevant, the value of sin 2 2θ 13 used to calculate the line is the best-fit value as determined by that collaboration. The colored bands represent the total systematic uncertainties, given by the square root of the diagonal elements of the covariance matrix for each experiment.

C.3 Markov chain Monte Carlo results
In figures 40 and 41, we show the result of our Markov-Chain Monte Carlo analyses described in section 7. In the two-dimensional plots, the blue, orange and red contours represent the 68.3%, 95% and 99% credible regions (C.R.); the one-dimensional plots add to this list a green curve representing the 90% C.R. Above each one-dimensional figure is shown the best-fit value of the corresponding parameter, as well as extent of the 68.3% C.R. region. To determine the posterior probability, we use 100 chains of walkers with randomized starting points. Each walker takes 1500 steps, of which the first 500 are conservatively ignored for the purposes of burn-in. The calculations have been performed using emcee [82]. Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.