Sterile Neutrino Oscillations: The Global Picture

Neutrino oscillations involving eV-scale neutrino mass states are investigated in the context of global neutrino oscillation data including short and long-baseline accelerator, reactor, and radioactive source experiments, as well as atmospheric and solar neutrinos. We consider sterile neutrino mass schemes involving one or two mass-squared differences at the eV^2 scale denoted by 3+1, 3+2, and 1+3+1. We discuss the hints for eV-scale neutrinos from nu_e disappearance (reactor and Gallium anomalies) and nu_mu->nu_e appearance (LSND and MiniBooNE) searches, and we present constraints on sterile neutrino mixing from nu_mu and neutral-current disappearance data. An explanation of all hints in terms of oscillations suffers from severe tension between appearance and disappearance data. The best compatibility is obtained in the 1+3+1 scheme with a p-value of 0.2% and exceedingly worse compatibilities in the 3+1 and 3+2 schemes.


Introduction
Huge progress has been made in the study of neutrino oscillations [1][2][3][4], and with the recent determination of the last unknown mixing angle θ 13 [5][6][7][8][9][10] a clear first-order picture of the three-flavor lepton mixing matrix has emerged, see e.g. [11]. Besides those achievements there are some anomalies which cannot be explained within the three-flavor framework and which might point towards the existence of additional neutrino flavors (so-called sterile neutrinos) with masses at the eV scale: • The LSND experiment [12] reports evidence forν µ →ν e transitions with E/L ∼ 1 eV 2 , where E and L are the neutrino energy and the distance between source and detector, respectively.
• This effect is also searched for by the MiniBooNE experiment [13][14][15][16][17], which reports a yet unexplained event excess in the low-energy region of the electron neutrino and anti-neutrino event spectra. No significant excess is found at higher neutrino energies.
Interpreting the data in terms of oscillations, parameter values consistent with the ones from LSND are obtained.
• Radioactive source experiments at the Gallium solar neutrino experiments SAGE and GALLEX have obtained an event rate which is somewhat lower than expected. This effect can be explained by the hypothesis of ν e disappearance due to oscillations with ∆m 2 1 eV 2 [18,19] ("Gallium anomaly").
• A recent re-evaluation of the neutrino flux emitted by nuclear reactors [20,21] has led to somewhat increased fluxes compared to previous calculations [22][23][24][25]. Based on the new flux calculation, the results of previous short-baseline (L 100 m) reactor experiments are in tension with the prediction, a result which can be explained by assumingν e disappearance due to oscillations with ∆m 2 ∼ 1 eV 2 [26] ("reactor anomaly").
Sterile neutrino oscillation schemes have been considered for a long time, see e.g. [27][28][29][30] for early references on four-neutrino scenarios. Effects of two sterile neutrinos at the eV scale have been considered first in [31,32], oscillations with three sterile neutrinos have been investigated in [33,34]. Thus, while the phenomenology of sterile neutrino models is well known, it has also been known for a long time that the LSND and MiniBooNE ( -) ν e appearance signals are in tension with bounds from disappearance experiments [35][36][37], challenging an interpretation in terms of sterile neutrino oscillations. This problem remains severe, and in the following we will give a detailed discussion of the status of the ( -) ν µ → ( -) ν e appearance hints from LSND and MiniBooNE in the light of recent global data. The situation is better for the hints for ( -) ν e disappearance from the reactor and Gallium anomalies, which are not in direct conflict with any other data. This somewhat ambiguous situation asks for an experimental answer, and indeed several projects are under preparation or under investigation, ranging from experiments with radioactive sources, short-baseline reactor experiments, to new accelerator facilities. A recent review on light sterile neutrinos including an overview on possible experimental tests can be found in [38].
In this paper we provide an extensive analysis of the present situation of sterile neutrino scenarios. We discuss the possibility to explain the tentative positive signals from LSND and MiniBooNE, as well as the reactor and Gallium anomalies in terms of sterile neutrino oscillations in view of the global data. New ingredients with respect to our previous analysis [39] are the following.
• We use latest data from the MiniBooNE ( -) ν µ → ( -) ν e appearance searches [15][16][17]. Our MiniBooNE appearance analysis is now based on Monte Carlo events provided by the collaboration taking into account realistic event reconstruction, correlation matrices, as well as oscillations of various background components in a consistent way.
• We include the constraints on the appearance probability from E776 [40] and ICARUS [41].
• We include the Gallium anomaly in our fit.
• We take into account constraints from solar neutrinos, the KamLAND reactor experiment, and LSND and KARMEN measurements of the reaction ν e + 12 C → e − + 12 N.
• The treatment of the reactor anomaly is improved and updated by taking into account small changes in the predicted anti-neutrino fluxes as well as an improved consideration of systematic errors and their correlations.
• We take into account charged-current (CC) and neutral-current (NC) data from the MINOS long-baseline experiment [42,43].
• In our analysis of atmospheric neutrino data, we improve our formalism to fully take into account the mixing of ν e with other active or sterile neutrino states.
All the data used in this work are summarized in Tab. 1. For other recent sterile neutrino global fits see [34,46,47]. We are restricting our analysis to neutrino oscillation data; implications for kinematic neutrino mass measurements and neutrino-less double betadecay data have been discussed recently in [48][49][50].
Sterile neutrinos at the eV scale also have implications for cosmology. If thermalized in the early Universe they contribute to the number of relativistic degrees of freedom (effective number of neutrino species N eff ). A review with many references can be found in [38]. Indeed there might be some hints from cosmology for additional relativistic degrees of freedoms (N eff bigger than 3), coming mainly from CMB data, e.g. [47,[51][52][53][54][55]. Recently precise CMB data from the PLANCK satellite have been released [56]. Depending on which additional cosmological data are used, N eff values ranging from 3.30 +0. 54 −0.51 to 3.62 +0.05 −0.48 (uncertainties at 95% CL) are obtained [56]. Constraints from Big Bang Nucleosynthesis on N eff have been considered recently in [57]. Apart from their contribution to N eff , thermalized eV-scale neutrinos would also give a large contribution to the sum of neutrino masses, which is constrained to be below around 0.5 eV. The exact constraint depends on which cosmological data sets are used, but the most important observables are those related to galaxy clustering [51][52][53][54]. In the standard ΛCDM cosmology framework the bound on the sum of neutrino masses is in tension with the masses required to explain the aforementioned terrestrial hints [54]. The question to what extent such sterile neutrino scenarios are disfavored by cosmology and how far one would need to deviate from the ΛCDM model in order to accommodate them remains under discussion [47,58,59]. We will not include any information from cosmology explicitly in our numerical analysis. However, we will keep in mind that neutrino masses in excess of few eV may become more and more difficult to reconcile with cosmological observations.  Table 1. Summary of the data used in this work divided into ( -) ν e , ( -) ν µ disappearance, and appearance data. The column "dof" gives the number of data points used in our analysis minus the number of free nuisance parameters for each experiment.
The outline of the paper is as follows. In Sec. 2 we introduce the formalism of sterile neutrino oscillations and fix the parametrization of the mixing matrix. We then consider ( -) ν e disappearance data in Sec. 3, discussing the reactor and Gallium anomalies. Constraints from ( -) ν µ disappearance as well as neutral-current data are discussed in Sec. 4, and global ( -) ν µ → ( -) ν e appearance data including the LSND and MiniBooNE signals in Sec. 5. The global fit of all these data combined is presented in Sec. 6 for scenarios with one or two sterile neutrinos. We summarize our results and conclude in Sec. 7. Supplementary material is provided in the appendices including a discussion of complex phases in sterile neutrino oscillations, oscillation probabilities for solar and atmospheric neutrinos, as well as technical details of our experiment simulations.

Oscillation parameters in the presence of sterile neutrinos
In this work we consider the presence of s = 1 or 2 additional neutrino states with masses in the few eV range. When moving from 1 to 2 sterile neutrinos the qualitative new feature is the possibility of CP violation already at short-baseline [33,60]. 1 The neutrino mass eigenstates ν 1 , . . . , ν 3+s are labeled such that ν 1 , ν 2 , ν 3 contribute mostly to the active flavor eigenstates and provide the mass squared differences required for "standard" three-flavor oscillations, ∆m 2 21 ≈ 7.5 × 10 −5 eV 2 and |∆m 2 31 | ≈ 2.4 × 10 −3 eV 2 , where The mass states ν 4 , ν 5 are mostly sterile and provide mass-squared differences in the range 0.1 eV 2 |∆m 2 41 |, |∆m 2 51 | 10 eV 2 . In the case of only one sterile neutrino, denoted by "3+1" in the following, we always assume ∆m 2 41 > 0, but the oscillation phenomenology for ∆m 2 41 < 0 would be the same. For two sterile neutrinos, we distinguish between a mass spectrum where ∆m 2 41 and ∆m 2 51 are both positive ("3+2") and where one of them is negative ("1+3+1"). The phenomenology is slightly different in the two cases [61]. We assume that the s linear combinations of mass states which are orthogonal to the three flavor states participating in weak interactions are true singlets and have no interaction with Standard Model particles. Oscillation physics is then described by a rectangular mixing matrix U αi with α = e, µ, τ and i = 1, ..., 3+s, and i U * αi U βi = δ αβ . 2 We give here expressions for the oscillation probabilities in vacuum, focusing on the 3+2 case. It is trivial to recover the 3+1 formulas from them by simply dropping all terms involving the index "5". Formulas for the 1+3+1 scenario are obtained by taking either ∆m 2 51 or ∆m 2 41 negative. Oscillation probabilities relevant for solar and atmospheric neutrinos are given in appendices C and D, respectively.
First we consider the so-called "short-baseline" (SBL) limit, where the relevant range of neutrino energies and baselines is such that effects of ∆m 2 21 and ∆m 2 31 can be neglected. Then, oscillation probabilities depend only on ∆m 2 i1 and U αi with i ≥ 4. We obtain for the appearance probability Eq. (2.1) holds for neutrinos; for anti-neutrinos one has to replace γ αβ → −γ αβ . Since Eq. (2.1) is invariant under the transformation 4 ↔ 5 and γ αβ → −γ αβ , we can restrict the parameter range to ∆m 2 54 ≥ 0, or equivalently ∆m 2 51 ≥ ∆m 2 41 , without loss of generality. Note also that the probability Eq. (2.1) depends only on the combinations |U α4 U β4 | and |U α5 U β5 |. The only SBL appearance experiments we are considering are in the ( -) ν µ → ( -) ν e channel. Therefore, the total number of independent parameters is 5 if only SBL appearance experiments are considered.
The 3+2 survival probability, on the other hand, is given in the SBL approximation by In this work we include also experiments for which the SBL approximation cannot be adopted, in particular MINOS and ICARUS. For these experiments φ 31 is of order one. In the following we give the relevant oscillation probabilities in the limit of φ 41 , φ 51 , φ 54 → ∞ and φ 21 → 0. We call this the long-baseline (LBL) approximation. In this case we obtain for the neutrino appearance probability (α = β) The corresponding expression for anti-neutrinos is obtained by the replacement I αβij → I * αβij . The survival probability in the LBL limit can be written as Note that in the numerical analysis of MINOS data neither the SBL nor the LBL approximations can be used because φ 31 , φ 41 and φ 51 can all become of order one either at the far detector or at the near detector [65]. Moreover, matter effects cannot be neglected in MINOS. All of these effects are properly included in our numerical analysis of the MINOS experiment.
Sometimes it is convenient to complete the 3 × (3 + s) rectangular mixing matrix by s rows to an n × n unitary matrix, with n = 3 + s. For n = 5 we use the following parametrization for U : where O ij represents a real rotation matrix by an angle θ ij in the ij plane, and V ij represents a complex rotation by an angle θ ij and a phase ϕ ij . The particular ordering of the rotation matrices is an arbitrary convention which, however, turns out to be convenient for practical reasons. 3 We have dropped the unobservable rotation matrix V 45 which just mixes sterile states. There is also some freedom regarding which phases are removed by field redefinitions and which ones are kept as physical phases. In appendix A we give a specific recipe for how to remove unphysical phases in a consistent way. Throughout this work we consider only phases which are phenomenologically relevant in neutrino oscillations. Under certain approximations, more phases may become unphysical. For instance, if an angle which corresponds to a rotation which can be chosen to be complex is zero the corresponding phase disappears. In practical situations often one or more of the mass-squared differences can be considered to be zero, which again implies that some of the angles and phases will become unphysical. In Tab Table 2. Mixing angle and Phase counting for s = 2 (3+2) and s = 1 (3+1) sterile neutrino schemes. The column "A/P" denotes the number of physical angles and phases, respectively. The column "LBL approx." ("SBL approx.") corresponds to the approximation ∆m 2 21 → 0 (∆m 2 21 → 0, ∆m 2 31 → 0). We give also specific examples for which angles can be chosen real, by denoting with V ij (O ij ) a complex (real) rotation. ϕ ij in a CP-even fashion. Our parametrization Eq. (2.6) guarantees that ( -) ν e disappearance experiments are independent of ϕ ij .
3 ν e andν e disappearance searches Disappearance experiments in the ( -) ν e sector probe the moduli of the entries in the first row of the neutrino mixing matrix, |U ei |. In the short-baseline limit of the 3+1 scenario, the only relevant parameter is |U e4 |. For two sterile neutrinos, also |U e5 | is relevant. In this section we focus on 3+1 models, and comment only briefly on 3+2. For 3+1 oscillations in the SBL limit, the ( -) ν e survival probability takes an effective two flavor form where we have defined an effective ( -) ν e -disappearance mixing angle by This definition is parametrization independent. Using the specific parametrization of Eq. (2.6) it turns out that θ ee = θ 14 .

SBL reactor experiments
The data from reactor experiments used in our analysis are summarized in Tab. 3. Our simulations make use of a dedicated reactor code based on previous publications, see e.g. [79,80]. We have updated the code to include the latest data and improved the treatment of uncertainties, see appendix B for details. The code used here is very similar to the one from Ref. [11], extended to sterile neutrino oscillations. The reactor experiments listed in Tab. 3 can be divided into short-baseline (SBL) experiments with baselines < 100 m, long-baseline (LBL) experiments with 100 m < L < 2 km, and the very longbaseline experiment KamLAND with an average baseline of 180 km. SBL experiments are not sensitive to standard three-flavor oscillations, but can observe oscillatory behavior for ∆m 2 41 , ∆m 2 51 ∼ 1 eV 2 . On the other hand, for long-baseline experiments, oscillations due to (∆m 2 31 , θ 13 ) are most relevant, and oscillations due to eV 2 -scale mass-squared differences are averaged out and lead only to a constant flux suppression. KamLAND is sensitive to oscillations driven by (∆m 2 21 and θ 12 ), whereas all θ 1k with k ≥ 3 lead only to a constant flux reduction.  [75] 820 1 rate Chooz [76] 1050 14 bins DoubleChooz [10] 1050 18 bins DayaBay [77] 6 rates -1 norm RENO [9] 2 rates -1 norm KamLAND [78] 17 bins Table 3. Reactor data used in our analysis. The experiments in the upper part of the table have baselines L < 100 m and are referred to as SBL reactor experiments. For these experiments we list the baseline, the ratio of the observed and predicted rates (based on the flux predictions from [20,21]), the uncorrelated error, and the total experimental error (i.e., the square-root of the diagonal entry of the correlation matrix). Uncertainties from the neutrino flux prediction are not included here, but are taken into account in our numerical analysis. For details on the correlations and flux errors see appendix B. In the lower part of the table, we list experiments with baselines of order 1 km (LBL reactors), and the KamLAND experiment with an average baseline of 180 km. For DayaBay, RENO, and KamLAND, we do not give a number for the baseline here because several baselines are involved in each of these experiments. The number of SBL data points is 19 or 76 and the total number of reactor data points is 75 or 132, depending on whether a total rates analysis (3 data points) or a spectral analysis (25+25+10 bins) is used for the Bugey3 experiment.
For the SBL reactor experiments we show in Tab. 3 also the ratio of the observed and predicted rate, where the latter is based on the flux calculations of [21] for neutrinos from 235 U, 239 Pu, 241 Pu fission and [20] for 238 U fission. The ratios are taken from [38] (which provides and update of [26]) and are based on the Particle Data Group's 2011 value for the neutron lifetime, τ n = 881.5 s [81]. 4 We observe that most of the ratios are smaller than one. In order to asses the significance of this deviation, a careful error analysis is necessary. In the last column of Tab. 3, we give the uncorrelated errors on the rates. They  Table 4. Best fit oscillation parameters and χ 2 min values as well as ∆χ 2 no-osc ≡ χ 2 no-osc − χ 2 min within a 3+1 framework. Except in the row labeled "SBL rates only", we always include spectral data from Bugey3. The row "global ν e disapp." includes the data from reactor experiments (see Tab. 3) as well as Gallium data, solar neutrinos and the LSND/KARMEN ν e disappearance data from ν e -12 C scattering. The CL for the exclusion of the no oscillation hypothesis is calculated assuming 2 degrees of freedom (|U e4 | and ∆m 2 41 ).
include statistical as well as uncorrelated experimental errors. In addition to these, there are also correlated experimental errors between various data points which are described in detail in appendix B. Furthermore, we take into account the uncertainty on the neutrino flux prediction following the prescription given in [21], see also appendix B for details. Fitting the SBL data to the predicted rates we obtain χ 2 /dof = 23.0/19 which corresponds to a p-value of 2.4%. When expressed in terms of an energy-independent normalization factor f , the best fit is obtained at Here ∆χ 2 f =1 denotes the improvement in χ 2 compared to a fit with f = 1. Clearly the p-value increases drastically when f is allowed to float, leading to a preference for f = 1 at the 2.7σ confidence level. This is our result for the significance of the reactor anomaly. Let us mention that (obviously) this result depends on the assumed systematic errors. While we have no particular reason to doubt any of the quoted errors, we have checked that when an adhoc additional normalization uncertainty of 2% (3%) is added, the significance is reduced to 2.1σ (1.7σ). This shows that the reactor anomaly relies on the control of systematic errors at the percent level.
The flux reduction suggested by the reactor anomaly can be explained by sterile neutrino oscillations. In Tab. 4 we give the best fit points and χ 2 values obtained by fitting SBL reactor data in a 3+1 framework. The allowed regions in ∆m 2 41 and sin 2 2θ 14 are shown in Fig. 1 (left) for a rate-only analysis as well as a fit including also Bugey3 spectral data. Both analyses give consistent results, with the main difference being that the spectral data disfavors certain values of ∆m 2 41 around 0.6 − 0.7 eV 2 and 1.3 eV 2 . The right panel of Fig. 1 shows the predicted rate suppression as a function of the baseline compared to the data. We show the prediction for the two best fit points from the left panel as well as one point located in the island around ∆m 2 41 0.9 eV 2 , which will be important in the combined fit with SBL appearance data. We observe that for the rateonly best fit point with ∆m 2 41 = 0.44 eV 2 the prediction follows the tendency suggested by the ILL, Bugey4, and SRP (24 km)  even shorter baselines. However, from the GOF values given in Tab. 4 we conclude that also those solutions provide a good fit to the data.

The Gallium anomaly
The response of Gallium solar neutrino experiments has been tested by deploying radioactive 51 Cr or 37 Ar sources in the GALLEX [84,85] and SAGE [86,87] detectors. Results are reported as ratios of observed to expected rates, where the latter are traditionally computed using the best fit cross section from Bahcall [88], see e.g. [19]. The values for the cross sections weighted over the 4 (2) neutrino energy lines from Cr (Ar) from [88] are σ B (Cr) = 58.1 × 10 −46 cm 2 , σ B (Ar) = 70.0 × 10 −46 cm 2 . While the cross section for 71 Ga → 71 Ge into the ground state of 71 Ge is well known from the inverse reaction there are large uncertainties when the process proceeds via excited states of 71 Ge at 175 and 500 keV. Following [88], the total cross section can be written as  Table 5. Best fit point of SBL reactor data and SBL reactor + Gallium data in a 3+2 oscillation scheme. We give the mass-squared differences in eV 2 and the mixing angles in radians. The relation to the mixing matrix elements is |U e4 | = cos θ 15 sin θ 14 and |U e5 | = sin θ 15 . The ∆χ 2 relative to 3+1 oscillations is evaluated for 2 dof, corresponding to the two additional parameters, while for the ∆χ 2 relative to no-oscillations we use 4 dof.
This means that the ratios of observed to expected rates based on the Bahcall prediction have to be rescaled by a factor 0.982 (0.977) for the Cr (Ar) experiments, so that we obtain for them the following updated numbers for our fits:

GALLEX:
R 1 (Cr) = 0.94 ± 0.11 [85] R 2 (Cr) = 0.80 ± 0.10 [85] , SAGE: R 3 (Cr) = 0.93 ± 0.12 [86] R 4 (Ar) = 0.77 ± 0.08 [87] . Here, we have symmetrized the errors, and we have included only experimental errors, but not the uncertainty on the cross section (see below). We build a χ 2 out of the four data points from GALLEX and SAGE and introduce two pulls corresponding to the systematic uncertainty of the two transitions to excited state according to Eq. (3.5). The determination of BGT 175 is relatively poor, with zero being allowed at 2σ. In order to avoid unphysical negative contributions from the 175 keV state, we restrict the domain of the corresponding pull parameter accordingly. Fitting the four data points with a constant neutrino flux normalization factor r we find Because of the different cross sections used, these results differ from the ones obtained in [19], where the best fit point is at r = 0.76, while the significance is comparable, around 3σ. An updated analysis including also a discussion of the implications of the measurement in [89] can be found in [90]. The event deficit in radioactive source experiments can be explained by assuming ν e mixing with an eV-scale state, such that ν e disappearance happens within the detector volume [18]. We fit the Gallium data in the 3+1 framework by averaging the oscillation probability over the detector volume using the geometries given in [18]. The resulting allowed region at 95% confidence level is shown in orange in Fig. 2. Consistent with the above discussion we find mixing angles somewhat smaller than those obtained by the authors of [19]. The best fit point from combined Gallium+SBL reactor data is given in Tab. 4, and the no-oscillation hypothesis is disfavored at 99.9% CL (2 dof) or 3.3σ compared to the 3+1 best fit point.
Let us consider now the Gallium and SBL reactor data in the framework of two sterile neutrinos, in particular in the 3+2 scheme. SBL ν e andν e disappearance data depend on 4 parameters in this case, ∆m 2 41 , ∆m 2 51 , and the two mixing angles θ 14 and θ 15 (or, equivalently, the moduli of the two matrix elements U e4 and U e5 ). We report the best fit points from SBL reactor data and from SBL reactor data combined with the Gallium . Allowed regions at 95% CL (2 dof) for 3+1 oscillations. We show SBL reactor data (blue shaded), Gallium radioactive source data (orange shaded), ν e disappearance constraints from ν e -12 C scattering data from LSND and KARMEN (dark red dotted), long-baseline reactor data from CHOOZ, Palo Verde, DoubleChooz, Daya Bay and RENO (blue short-dashed) and solar+KamLAND data (black long-dashed). The red shaded region is the combined region from all these ν e andν e disappearance data sets.
source data in Tab. 5. For these two cases we find an improvement of 5.3 and 3.8 units in χ 2 , respectively, when going from the 3+1 scenario to the 3+2 case. Considering that the 3+2 model has two additional parameters compared to 3+1, we conclude that there is no improvement of the fit beyond the one expected by increasing the number of parameters, and that SBL ( -) ν e data sets show no significant preference for 3+2 over 3+1. This is also visible from the fact that the confidence level at which the no oscillation hypothesis is excluded does not increase for 3+2 compared to 3+1, see the last columns of Tabs. 4 and 5. There the ∆χ 2 is translated into a confidence level by taking into account the number of parameters relevant in each model, i.e., 2 for 3+1 and 4 for 3+2.

Global data on ν e andν e disappearance
Let us now consider the global picture regarding ( -) ν e disappearance. In addition to the short-baseline reactor and Gallium data discussed above, we now add data from the following experiments: • The remaining reactor experiments at a long baseline ("LBL reactors") and the very long-baseline reactor experiment KamLAND, see table 3.
• Global data on solar neutrinos, see appendix C for details.
• LSND and KARMEN measurements of the reaction ν e + 12 C → e − + 12 N [91,92]. The experiments have found agreement with the expected cross section [93], hence their measurements constrain the disappearance of ν e with eV-scale mass-squared differences [94,95]. Details on our analysis of the 12 C scattering data are given in appendix E.1.
So far the LBL experiments DayaBay and RENO have released only data on the relative comparison of near (L ∼ 400 m) and far (L ∼ 1.5 km) detectors, but no information on the absolute flux determination is available. Therefore, their published data are essentially insensitive to oscillations with eV-scale neutrinos and they contribute only indirectly via constraining θ 13 . In our analysis we include a free, independent flux normalization factor for each of those two experiments. Chooz and DoubleChooz both lack a near detector. Therefore, in the official analyses performed by the respective collaborations the Bugey4 measurement is used to normalize the flux. This makes the official Chooz and DoubleChooz results on θ 13 also largely independent of the presence of sterile neutrinos. However, the absolute rate of Bugey4 in terms of the flux predictions is published (see Tab. 3) and we can use this number to obtain an absolute flux prediction for Chooz and DoubleChooz. Therefore, in our analysis Chooz and DoubleChooz (as well as Palo Verde) by themselves also show some sensitivity to sterile neutrino oscillations. In a combined analysis of Chooz and DoubleChooz with SBLR data the official analyses are recovered approximately. Previous considerations of LBL reactor experiments in the context of sterile neutrinos can be found in Refs. [96][97][98][99].
We show in Tab. 4 a combined analysis of the SBL and LBL reactor experiments (row denoted by "SBL+LBL"), where we minimize with respect to θ 13 . We find that the significance of the reactor anomaly is not affected by the inclusion of LBL experiments and finite θ 13 . The ∆χ 2 no-osc even slightly increases from 9.0 to 9.2 when adding LBL data to the SBL data ("no-osc" refers here to θ 14 = 0). Hence, we do not agree with the conclusions of Ref. [100], which finds that the significance of the reactor anomaly is reduced to 1.4σ when LBL data and a finite value of θ 13 is taken into account.
Solar neutrinos are also sensitive to sterile neutrino mixing (see e.g. [101][102][103]). The main effect of the presence of ν e mixing with eV states is an over-all flux reduction. While this effect is largely degenerate with θ 13 , a non-trivial bound is obtained in the combination with DayaBay, RENO and KamLAND. KamLAND is sensitive to oscillations driven by ∆m 2 21 and θ 12 , whereas sterile neutrinos affect the overall normalization, degenerate with θ 13 . The matter effect in the sun as well as SNO NC data provide additional signatures of sterile neutrinos, beyond an overall normalization. As we will show in Sec. 4 solar data depend also on the mixing angles θ 24 and θ 34 , controlling the fraction of ν e → ν s transitions, see e.g. [101]. As discussed in appendix C, in the limit ∆m 2 i1 = ∞ for i ≥ 3, solar data depends on 6 real mixing parameters, 1 complex phase and ∆m 2 21 . Hence, in a 3+1 scheme all six mixing angles are necessary to describe solar data in full generality. However, once other constraints on mixing angles are taken into account the effect of θ 24 , θ 34 , and the complex phase are tiny and numerically have a negligible impact on our results. Therefore we set θ 24 = θ 34 = 0 for the solar neutrino analysis in this section. In this limit solar data becomes also independent of the complex phase.
The results of our fit to global ( -) ν e disappearance data are shown in Fig. 2 and the best fit point is given in Tab. 4. For this analysis the mass-squared differences ∆m 2 21 and ∆m 2 31 have been fixed, whereas we marginalize over the mixing angles θ 12 and θ 13 . We see from Fig. 2 that the parameter region favored by short-baseline reactor and Gallium data is well consistent with constraints from long-baseline reactors, KARMEN's and LSND's ν e rate, and with solar and KamLAND data.
Recently, data from the Mainz [104] and Troitsk [105] tritium beta-decay experiments have been re-analyzed to set limits on the mixing of ν e with new eV neutrino mass states. Taking the results of [105] at face value, the Troitsk limit would cut-off the highmass region in Fig. 2 at around 100 eV 2 [106] (above the plot-range shown in the figure). The bounds obtained in [104] are somewhat weaker. The differences between the limits obtained in [104] and [105] depend on assumptions concerning systematic uncertainties and therefore we prefer not to explicitly include them in our fit. The sensitivity of future tritium decay data from the KATRIN experiment has been estimated in [107]. Implications of sterile neutrinos for neutrino-less double beta-decay have been discussed recently in [48][49][50].
Let us now address the question whether the presence of a sterile neutrino affects the determination of the mixing angle θ 13 (see also [99,100]). In Fig. 3 we show the combined determination of θ 13 and θ 14 for two fixed values of ∆m 2 41 . The left panel corresponds to a relatively large value of 10 eV 2 , whereas for the right panel we have chosen the value favored by the global ( -) ν e disappearance best fit point, 1.78 eV 2 . The mass-squared differences ∆m 2 21 and ∆m 2 31 have been fixed, whereas we marginalize over the mixing angle θ 12 . We observe a clear complementarity of the different data sets: SBL reactor and Gallium data determine |U e4 |, since oscillations are possible only via ∆m 2 41 , all other mass-squared differences are effectively zero for them. For LBL reactors ∆m 2 41 can be set to infinity, ∆m 2 31 is finite, and ∆m 2 21 is effectively zero; therefore they provide an unambiguous determination of θ 13 by comparing near and far detector data. The upper bound on |U e4 | from LBL reactors is provided by Chooz, Palo Verde, DoubleChooz, since for those experiments also information on the absolute flux normalization can be used, as mentioned above. In contrast, for solar neutrinos and KamLAND, both ∆m 2 41 and ∆m 2 31 are effectively infinite, and θ 13 and θ 14 affect essentially the overall normalization and are largely degenerate, as visible the figure.
In conclusion, the θ 13 determination is rather stable with respect to the presence of sterile neutrinos. We note, however, that its interpretation becomes slightly more complicated. For instance, in the 3+1 scheme using the parametrization from Tab. 2, the relation between mixing matrix elements and mixing angles is |U e3 | = cos θ 14 sin θ 13 and |U e4 | = sin θ 14 . Hence, the one-to-one correspondence between |U e3 | and θ 13 as in the three-flavor case is spoiled. 4 ν µ ,ν µ , and neutral-current disappearance searches In this section we discuss the constraints on the mixing of ( -) ν µ and ( -) ν τ with new eV-scale mass states. In the 3+1 scheme this is parametrized by |U µ4 | and |U τ 4 |, respectively. In terms of the mixing angles as defined in Eq. (2.6) we have |U µ4 | = cos θ 14 sin θ 24 and |U τ 4 | = cos θ 14 cos θ 24 sin θ 34 . In the present paper we include data sets from the following experiments to constrain ( -) ν µ and ( -) ν τ mixing with eV states: • SBL ν µ disappearance data from CDHS [108]. Details of our simulation are given in [79].
• Super-Kamiokande. It has been pointed out in [109] that atmospheric neutrino data from Super-Kamiokande provide a bound on the mixing of ν µ with eV-scale mass states, i.e., on the mixing matrix elements |U µ4 |, |U µ5 |. In addition, neutral-current matter effects provide a constraint on |U τ 4 |, |U τ 5 |. A discussion of the effect is given in the appendix of [33]. Details on our analysis and references are given in appendix D.
• MINOS [42,43]. The MINOS long-baseline experiment has published data on charged current (CC) ( -) ν µ disappearance as well as on the neutral current (NC) count rate. Both are based on a comparison of near and far detector measurements. In addition to providing the most precise determination of ∆m 2 31 (from CC data), those data can also be used to constrain sterile neutrino mixing, where CC (NC) data are mainly relevant for |U µ4 |, |U µ5 | (|U τ 4 |, |U τ 5 |). See appendix E.5 for details.
Limits on the |U µi | row of the mixing matrix come from ( -) ν µ disappearance experiments. In a 3+1 scheme the ( -) ν µ SBL disappearance probability is given by   2 41 at 99% CL (2 dof) from CDHS, atmospheric neutrinos, MiniBooNE disappearance, MINOS CC and NC data, and the combination of them. We minimize with respect to |U τ 4 | and the complex phase ϕ 24 . In red we show the region preferred by LSND and MiniBooNE appearance data combined with reactor and Gallium data on ( -) ν e disappearance, where for fixed |U µ4 | 2 we minimize with respect to |U e4 | 2 . Right: Constraints in the plane of |U τ 4 | 2 and ∆m 2 41 at 99% CL (2 dof) from MINOS CC + NC data (green) and the combined global ( -) ν µ and NC disappearance data (blue region, black curves). We minimize with respect to |U µ4 | and we show the weakest ("best phase") and strongest ("worst phase") limits, depending on the choice of the complex phase ϕ 24 . In both panels we minimize with respect to ∆m 2 31 , θ 23 , and we fix sin 2 2θ 13 = 0.092 and θ 14 = 0 (except for the evidence regions in the left panel).
where we have defined an effective ( -) ν µ disappearance mixing angle by i.e., in our parametrization (2.6) the effective mixing angle θ µµ depends on both θ 24 and θ 14 .
In contrast to the ν e disappearance searches discussed in the previous section, experiments probing ( -) ν µ disappearance have not reported any hints for a positive signal. We show the limits from the data listed above in the left panel of Fig. 4. Note that the MINOS limit is based on the comparison of the data in near and far detectors. For ∆m 2 41 ∼ 10 eV 2 oscillation effects become relevant at the near detector, explaining the corresponding features in the MINOS bound around that value of ∆m 2 41 , whereas the features around ∆m 2 41 ∼ 0.1 eV 2 emerge from oscillation effects in the far detector. The roughly constant limit in the intermediate range 0.5 eV 2 ∆m 2 41 3 eV 2 corresponds to the limit ∆m 2 41 ≈ 0 (∞) in the near (far) detector adopted in [42,43]. In that range the MINOS limit on |U µ4 | is comparable to the one from SuperK atmospheric data. For ∆m 2 41 1 eV 2 the limit is dominated by CDHS and MiniBooNE disappearance data.
In Fig. 4 (left) we show also the region preferred by the hints for eV-scale oscillations from LSND and MiniBooNE appearance data (see next section) combined with reactor and Gallium data on ( -) ν e disappearance. For fixed |U µ4 | 2 we minimize the corresponding χ 2 with respect to |U e4 | 2 to show the projection in the plane of |U µ4 | 2 and ∆m 2 41 . The tension between the hints in the ν µ data is clearly visible in this plot. We will discuss this conflict in detail in section 6.
Limits on the mixing of ν τ with eV-scale states are obtained from data involving information from NC interactions, which allow to distinguish between 5 The relevant data samples are atmospheric and solar neutrinos (via the NC matter effect) and MINOS NC data. Furthermore, the parameter |U τ 4 | controls the relative weight of the oscillation modes ν µ → ν τ and ν µ → ν s at the "atmospheric" scale ∆m 2 31 : a large value of |U τ 4 | implies a large fraction of ν µ → ν s oscillations at the ∆m 2 31 scale. The limit in the plane of |U τ 4 | 2 and ∆m 2 41 is shown in the right panel of Fig. 4. As follows from Eq. (2.4) (see also appendix A), in the LBL approximation relevant for MINOS NC data a complex phase enters the oscillation probabilities, corresponding to the combination arg(U * µ4 U τ 4 U µ3 U * τ 3 ). In our calculations we take the rotation matrix V 24 to be complex and use the phase ϕ 24 to parametrize this phase. In Fig. 4 we illustrate the impact of this phase by showing the strongest and weakest limit obtained when varying ϕ 24 . We observe that the limit from MINOS depends quite significantly on this phase. The different shapes of the "best phase" and "worst phase" regions emerge from the different properties of CC and NC data. For the weakest limit ("best phase") the fit uses the freedom of the term including the complex phase, which implies that a finite value of θ 24 (or |U µ4 |) is adopted, subject to the constraint from MINOS CC data. Therefore the same structure as in the left panel of Fig. 4 becomes visible also in limit on |U τ 4 |. If we force the phase to take on a value not favored by the fit, a smaller χ 2 is obtained for θ 24 close to zero, which implies that the phase actually becomes unphysical. In this case CC data are not important for the limit on |U τ 4 |, which then is dominated by NC data. Because of the much worse energy reconstruction for NC events compared to CC ones, the features induced by finite values of ∆m 2 41 in either the far or near detector become to a large extent washed out.
The global limit on |U τ 4 | is actually dominated by atmospheric neutrino data and shows only a very weak dependence on the complex phase. In our atmospheric neutrino analysis the information on |U τ 4 | enters via the NC matter effect induced by the presence of sterile neutrinos. A large value of |U τ 4 | would imply a significant matter effect in ∆m 2 31 driven ( -) ν µ disappearance, which is not consistent with the zenith angle distribution observed in SuperK. We find the limit from global data, largely independent of ∆m 2 41 as well as complex phases. Fig. 5 shows the constraints in the plane of |U µ4 | 2 and |U τ 4 | 2 for three fixed values of ∆m 2 41 . We observe the comparable bound on |U µ4 | 2 from MINOS (mainly CC data) and atmospheric, which however is superseded by CDHS, MiniBooNE for ∆m 2 41 1 eV 2 (left and middle panels). Those latter data however, do not provide any constraint on |U τ 4 | 2 , where the global bound is dominated by atmospheric neutrinos for all values of ∆m 2 41 of interest. We also observe that solar neutrinos provide a bound on |U τ 4 | 2 of similar strength ( -) ν µ disappearance + LBL reactors (red), and the combination of those data (blue). The constraint from solar neutrinos is shown in magenta. Regions are shown at 90% and 99% CL (2 dof) with respect to the χ 2 minimum at the fixed ∆m 2 41 . We minimize with respect to complex phases and include effects of θ 13 and θ 14 where relevant. The gray region is excluded by the unitarity requirement |U µ4 | 2 + |U τ 4 | 2 ≤ 1. Note the different scale on the axes.
as MINOS data, thanks to the NC matter effect and SNO NC data. No relevant limit can be set on |U µ4 | 2 from solar neutrinos. 5 ν µ → ν e andν µ →ν e appearance searches Now we move on to the discussion of appearance searches. In contrast to disappearance experiments which probe only one row of the mixing matrix, i.e., only the elements |U αi | for fixed α, an appearance experiment in the channel ( -) ν α → ( -) ν β is sensitive to two rows via combinations like |U αi U βi | and potentially to some complex phases. In the SBL approximation the 3+1 appearance probability in the phenomenologically most relevant channel ( -) ν µ → ( -) ν e takes the form where we have defined an effective mixing angle by In the parametrization from Eq. (2.6) we obtain sin 2θ µe = sin θ 24 sin 2θ 14 . The oscillation probability in the 3+2 scheme is given in Eq. (2.1). The 3+1 SBL appearance probability does not depend on complex phases, whereas in the 3+2 scheme CP violation via complex phases is possible at SBL [33,60]. Our analyses of LSND [12], KARMEN [118], NOMAD [119] ( -) ν µ → ( -) ν e appearance data are based on [33,79,120], where references and technical details can be found. Our analyses of E776 [40] and ICARUS [41], used for the first time in the present paper, are described in appendices E.2 and E.3, respectively. 6 In the case of LSND, we use only the decay-at-rest (DAR) data which are most sensitive to oscillations. Decay-in-flight (DIF) data on ν µ → ν e are consistent with the signal seen in DAR data, however the significance of the oscillation signal for DIF is much less than for DAR. A combined DAR-DIF analysis in a two-neutrino framework would shift the allowed region to somewhat smaller values of the mixing angle. A detailed discussion of LSND DAR versus DIF in the context of 3+1 neutrino oscillations can be found in [35].
In our analysis of the MiniBooNE ν e andν e appearance search we use the latest data 7 from [16], following closely the analysis instructions provided by the collaboration. Details are given in appendix E.4. Since their very first data release in 2007 [13], MiniBooNE observe an excess of events over expected background in the low energy ( 500 MeV) region of the event spectrum [122]. Since the spectral shape of the excess is difficult to explain in a two-flavor oscillation framework, historically the analysis window has been (somewhat artificially) divided into a low energy region containing the excess events and a high energy part with no excess. 8 Preliminary results from anti-neutrinos showed also some indication for an event excess in the high energy part of the spectrum [14] which indicated the need for CP violation in order to reconcile neutrino and anti-neutrino data. However, for the most recent data [15,16] the shapes of the neutrino and anti-neutrino spectra appear to be consistent with each other, showing excess events below around 500 MeV and data consistent with background in the high energy region, see Fig. 6. In our work we always analyse the full energy spectrum for both neutrinos and anti-neutrinos. Contrary to the analysis of the MiniBooNE collaboration we take into account oscillations of all background components in a consistent way, according to the particular oscillation framework to be tested, see appendix E.4 for details.
In Fig. 7 we show a summary of the ( -) ν µ → ( -) ν e data in the 3+1 scheme. We observe an allowed region from MiniBooNE anti-neutrino data that is driven by the event excess below around 800 MeV and has significant overlap with the parameter region preferred by LSND. At the 99% CL shown in the figure, MiniBooNE neutrino data give only an upper bound, although we find closed regions (again driven by the low-energy excess) at lower confidence levels. This is in qualitative agreement with the results obtained by the MiniBooNE collaboration, compare Fig. 4 of [16] or Fig. 3 of [17]. The different shape of our regions is due to the oscillations of the background components. Those can be relatively large in an appearance only fit, since for fixed sin 2 2θ µe we allow |U µ4 | and |U e4 | to vary freely, subject to the constraint Eq. (5.2). We have checked that when we adopt the same assumptions as the MiniBooNE collaboration we recover their regions/bounds with good accuracy.
The recent constraint on ν µ → ν e appearance from ICARUS [41] at long-baseline leads to a bound on sin 2 2θ µe essentially independent of ∆m 2 41 in the range shown here. It excludes in particular the region of large mixing and low ∆m 2 41 that is otherwise unconstrained by appearance experiments. 9 An important background for the ∆m 2 41 driven 7 The recent updated analysis from MiniBooNE [17] is based on the same data as [16], corresponding to 6.46 × 10 20 protons on target in neutrino mode and 11.27 × 10 20 protons on target in anti-neutrino mode. 8 The importance of energy reconstruction effects for the low energy excess has been pointed out in Refs. [123,124], see also [17]. 9 Note that this region is also excluded by νe and νµ disappearance searches once Eq. (5.2) is used to relate sin 2 2θµe to the effective mixing angles probed by the disappearance experiments.   Figure 6. MiniBooNE neutrino (left) and anti-neutrino (right) data compared to the predicted spectra for the 3+1, 3+2, and 1+3+1 best fit points for the combined appearance data (the data set used in Fig. 7) and global data including disappearance. Shaded histograms correspond to the unoscillated backgrounds. The predicted spectra include the effect of background oscillations. The corresponding χ 2 values (for combined neutrino and anti-neutrino data) are also given in the plot.
ν e appearance experiments in the 3+1 scheme. We show the regions from LSND and MiniBooNE anti-neutrino data and the bounds from MiniBooNE neutrinos, KARMEN, NOMAD, ICARUS, and E776. The latter is combined with LBL reactor data in order to constrain the oscillations of the ( -) ν e backgrounds; this leads to a non-vanishing bound on sin 2 2θ µe from E776 at low ∆m 2 41 . The red region corresponds to the combination of those data, with the star indicating the best fit point. ν µ → ν e search in ICARUS are ν e appearance events due to ∆m 2 31 and θ 13 . Furthermore, as discussed in section 2 and appendix A the long-baseline appearance probability in the 3+1 scheme depends on one complex phase. In deriving the ICARUS bound shown in  Table 6. Individual contributions to the χ 2 at the best fit point of the combined appearance data for 3+1, 3+2, and 1+3+1. The individual χ 2 values do not add up to the number for the combined fit given in the last row because of correlations between MiniBooNE neutrino and anti-neutrino data. Fig. 7 we fix the parameters sin 2 2θ 13 = 0.092 and ∆m 2 31 = 2.4 × 10 −3 eV 2 but marginalize over the relevant complex phase.
As visible in Fig. 7 there is a consistent overlap region for all ν e experiments and we can perform a combined analysis. The resulting region is shown in red in Fig. 7. The best fit point is at sin 2 2θ µe = 0.013, ∆m 2 41 = 0.42 eV 2 with χ 2 min /dof = 87.9/(68 − 2) dof (GOF = 3.7%). The no-oscillation hypothesis is excluded with respect to the best fit point with ∆χ 2 = 47.7. This large value is mostly driven by LSND. The relatively low GOF comes mainly from MiniBooNE neutrino data, as can be seen from Tab. 6, where we list the individual contribution of the experiments to the total appearance χ 2 . This is also obvious from Fig. 6, showing that at the 3+1 appearance best fit point (black dotted histogram) the fit to the neutrino spectrum is not very good, predicting too much excess in the region 0.6 − 1 GeV and only partially explaining the excess in the data below 0.4 GeV.
Analysing the same data in the 3+2 scheme we find a best fit point at ∆m 2 41 = 0.57 eV 2 , ∆m 2 51 = 1.24 eV 2 , with χ 2 min /dof = 72.7/(68 − 5) (GOF = 19%). The GOF improves considerably with respect to 3+1. We find For 3 dof (corresponding to the 3 additional SBL appearance parameters in 3+2) this implies that appearance data favor 3+2 over 3+1 at the 99.8% CL. From Tab. 6 we see that basically all experiments have a reasonable χ 2 /dof value (maybe with the exception of E776, which intrinsically has a somewhat high χ 2 ). In particular MiniBooNE neutrino data improve by 8.7 units compared to 3+1. This is also visible in Fig. 6, with the red dotted curve (3+2 appearance best fit) showing a much better fit than the black dotted one (3+1 appearance best fit), with χ 2 = 24 for 22 dof for the joint MiniBooNE neutrino and anti-neutrino data. The appearance data fit in a 1+3+1 scheme is similar to the 3+2 case, with a slightly better fit for LSND and MiniBooNE neutrino, and a slightly worse fit for MiniBooNE anti-neutrino data, compare Tab. 6. The predicted MiniBooNE spectra at the 1+3+1 appearance best fit are shown as blue dotted histograms in Fig. 6. We find for 1+3+1 χ 2 min /dof = 74.6/(68 − 5) (GOF = 15%) and

Combined analysis of global data
We now address the question whether the hints for sterile neutrino oscillations discussed above can be reconciled with each other as well as with all existing bounds within a common sterile oscillation framework. In section 6.1 we discuss the 3+1 scenario, whereas in section 6.2 we investigate the 3+2 and 1+3+1 schemes.

3+1 global analysis
In the 3+1 scheme, SBL oscillations are described by effective 2-flavor oscillation probabilities, involving effective mixing angles for each oscillation channel. The expressions for the effective angles θ ee , θ µµ , θ µe governing the ( -) ν e disappearance, ( -) ν µ disappearance, and ( -) ν µ → ( -) ν e appearance probabilities are given in Eqs. (3.2), (4.2), (5.2), respectively. From those definitions it is obvious that the three relevant oscillation amplitudes are not independent, since they depend only on two independent fundamental parameters, namely |U e4 | and |U µ4 |. Neglecting terms of order |U α4 | 4 (α = e, µ) one finds sin 2 2θ µe ≈ 4 sin 2 2θ ee sin 2 2θ µµ . (6.1) Hence, the appearance amplitude relevant for the LSND/MiniBooNE signals is quadratically suppressed by the disappearance amplitudes, which both are constrained to be small. This leads to the well-known tension between appearance signals and disappearance data in the 3+1 scheme, see e.g. [29,30] for early references. This tension is illustrated for the latest global data in the left panel of Fig. 8, where we show the allowed region for all appearance experiments (the same as the combined region from Fig. 7), compared to the limit from disappearance experiments in the plane of sin 2 2θ µe and ∆m 2 41 . The preferred values of ∆m 2 41 for disappearance data come from the reactor and Gallium anomalies. The regions for disappearance data, however, are not closed in this projection in the parameter space and include sin 2 2θ µe = 4|U e4 U µ4 | 2 = 0, which always can be achieved by letting U µ4 → 0 because of the non-observation of any positive signal in SBL ( -) ν µ disappearance. The upper bound on sin 2 2θ µe from disappearance emerges essentially as the product of the upper bounds on |U e4 | and |U µ4 | from ( -) ν e and ( -) ν µ disappearance according to Eq. (6.1). We observe from the plot the clear tension between those data sets, with only marginal overlap regions at above 99% CL around ∆m 2 41 ≈ 0.9 eV 2 and at 3σ around ∆m 2 41 ≈ 6 eV 2 . The tension between disappearance and appearance experiments can be quantified by using the so-called parameter goodness of fit (PG) test [35,125]. It is based on the χ 2 definition χ 2 PG ≡ χ 2 min,glob − χ 2 min,app − χ 2 min,dis = ∆χ 2 app + ∆χ 2 dis , where χ 2 min,glob is the χ 2 minimum of the global data combined, χ 2 min,app and χ 2 min,dis are the minima of appearance and disappearance data taken separately, and χ 2 x,glob is χ 2 x evaluated at the best fit point of the global data. χ 2 PG should be evaluated with the number of dof corresponding to the number of parameters in common between appearance and disappearance data (2 in the case of 3+1). From the numbers given in Tab. 7 we observe that the global 3+1 fit leads to χ 2 min /dof = 712/680 with a p-value 19%, whereas the PG test indicates that appearance and disappearance data are consistent with each other only with a p-value of about 10 −4 . The strong tension in the fit is not reflected in the global χ 2 minimum, since there is a large number of data points not sensitive to the tension, which leads to the "dilution" of the GOF value in the global fit, see [125] for a discussion. In contrast, the PG test is designed to test the consistency of different parts of the global data.
The conflict between the hints for eV 2 -scale oscillations and null-result data is also illustrated in the right panel of Fig. 8   . Allowed regions in the plane of |∆m 2 41 | and |∆m 2 51 | in 3+2 (upper-left part) and 1+3+1 (lower-right part) mass schemes. We minimize over all mixing angles and phases. We show the regions for appearance data (light blue) and disappearance data (light green) at 95% CL (2 dof), and global data (dark and light red) at 95% and 99% CL (2 dof). data. We find no overlap region at 99% CL. Hence, an explanation of all anomalies within the 3+1 scheme is in strong tension with constraints from various null-result experiments.

3+2 and 1+3+1 global analyses
Now we move to the global analysis within a two-sterile neutrino scenario in order to investigate whether the additional freedom allows to mitigate the tension in the fit. We give χ 2 and PG values for the 3+2 and 1+3+1 schemes in Tab. 7 and the corresponding values of the parameters in Tab. 8. We observe from the PG values that the tension between appearance and disappearance data remains severe, especially for the 3+2 case, with a PG value below 10 −4 , even less than for 3+1. For 1+3+1 consistency at the 2 per mille level can be achieved.
Let us first discuss the 3+2 fit. We find a modest improvement of the total χ 2 in the global fit compared to 3+1 by  and ∆m 2 51 at 90% and 99% CL (2 dof). We minimize over all undisplayed mixing parameters. We show the regions for appearance data (blue), disappearance data (green), and the global data (red).
Evaluated for 4 additional parameters relevant for SBL data in 3+2 compared to 3+1 this corresponds to 96.9% CL.
The origin of the very low parameter goodness of fit can be understood by looking at the contributions of appearance and disappearance data to χ 2 PG . Tab. 7 shows that the χ 2 of appearance data at the global best fit point, χ 2 app,glob , changes only by about 3 units between 3+1 and 3+2. However, if appearance data is fitted alone, an improvement of 15.2 units in χ 2 is obtained when going from 3+1 to 3+2, see Eq. (5.3). The fact that appearance data by themselves are fitted much better in 3+2 than in 3+1 leads to the large value of χ 2 PG = 25.8, with a contribution of 19.7 from appearance data. In other words: the fit to appearance data at the global 3+2 best fit point (χ 2 app,glob = 92.4/68, p-value 2.6%) is much worse than at the appearance-only 3+2 best fit point (χ 2 min,app /dof = 72.7/63, p-value 19%). This interpretation is also supported by Fig. 6, showing an equally bad fit to MiniBooNE neutrino data at the 3+1 and 3+2 global best fit points (black solid and red solid histograms, respectively).
We further investigate the origin of the tension in the 3+2 fit in Figs. 9 and 10. In Fig. 9 we show the allowed regions in the multi-dimensional parameter space projected onto the plane of the two mass-squared differences for appearance and disappearance data separately, as well as the combined region. The 3+2 global best fit point happens close to an overlap region of appearance and disappearance data at 95% CL in that plot. However, an overlap in the projection does not imply that the multi-dimensional regions overlap. In the left panel of Fig. 10 we fix the mass-squared differences to values close to the global 3+2 best fit point and show allowed regions in the plane of |U e4 U µ4 | and |U e5 U µ5 |. These are the 5-neutrino analogs to the 4-neutrino SBL amplitude sin 2θ µe . Similar as in the 3+1 case we observe a tension between appearance and disappearance data, with no overlap at 99% CL. This explains the small PG probability at the 3+2 best fit point. The right panel of Fig. 10 corresponds to the local minimum of the combined fit visible in Fig. 9 -25 -  around ∆m 2 41 = 0.9 eV 2 , ∆m 2 51 = 6 eV 2 . In this case no tension is visible in the mixing parameters shown in Fig. 10, however, from Fig. 9 we see that those values for the masssquared differences are actually not preferred by appearance data, which again leads to a degraded GOF. We conclude that the tension between appearance and disappearance data cannot be resolved in the 3+2 scheme.
For the 1+3+1 ordering of 5-neutrino mass states a somewhat better fit can be obtained. We find χ 2 3+1,glob − χ 2 1+3+1,glob = 17.8 , (6.4) corresponding to disfavoring 3+1 at the 99.9% CL (4 dof) compared to 1+3+1. We observe from Tab. 7 that at the 1+3+1 global best fit point a much better fit to appearance data is obtained than at the 3+2 best fit point (χ 2 app,glob = 82.4 compared to 92.4). As visible from the blue solid histogram in Fig. 6 the lack of an event excess in the MiniBooNE neutrino spectrum around 0.6 GeV is reasonably well reproduced at the 1+3+1 global best fit point, although the low energy excess is still under-predicted. The χ 2 PG for appearance versus disappearance for 1+3+1 is even slightly less than for 3+1 (16.8 versus 18.0). Because of the additional parameters relevant for the evaluation of χ 2 PG the p-value 0.2% is obtained for 1+3+1, about one order of magnitude better than in 3+1.
The projection of the allowed regions on the plane of the mass-squared differences is shown in the lower-right part of Fig. 9. Note that the disappearance regions are to good accuracy symmetric for 3+2 and 1+3+1. This can be understood from Eq. (2.3), where the difference between 3+2 and 1+3+1 appears only in the last term, which is suppressed by the 4th power of small matrix elements, compared to the leading terms at 2nd order. We observe in Fig. 9 that appearance and disappearance regions for 1+3+1 both overlap with the combined best fit point. In Fig. 11 we show again a section through the parameter space at fixed values for the mass-squared differences close to the global best fit point. Although the tension between appearance and disappearance is still visible (no overlap of the 90% CL regions) the disagreement is clearly less severe than in the 3+2 situation shown in the left panel of Fig. 10, and in Fig. 11 we find significant overlap at 99% CL, in agreement with the somewhat improved PG p-value.

Summary and discussion
We have investigated in detail the status of hints for eV 2 -scale neutrino oscillations, namely the indications for ( -) ν e disappearance due to the reactor and Gallium anomalies, and the indications for ( -) ν µ → ( -) ν e appearance from LSND and MiniBooNE. Those hints have been analysed in the context of the global data on neutrino oscillations, including short and longbaseline accelerator and reactor experiments, as well as atmospheric and solar neutrinos. Our main findings can be summarized as follows.
1. For all fits a global χ 2 min /dof ≈ 1 is obtained in our analysis, involving 689 data points in total, see table 7.
2. However, a joint fit of all anomalies suffers from tension between appearance and disappearance data, mainly due to the strong constraints from ( -) ν µ disappearance data.

The tension in the fit is driven by the LSND and MiniBooNE appearance hints, since oscillations in the
ν e channel inevitably predict also a signal in ( -) ν µ disappearance, which is not observed at the relevant L/E scale. 4. In contrast, the reactor and Gallium anomalies are not in direct conflict with other data, since ( -) ν e and ( -) ν µ disappearance at the eV 2 scale are controlled by independent parameters. 5. In a 3+1 scheme the compatibility of appearance and disappearance data is at the level of 10 −4 . The individual allowed regions have marginal overlap at about 99% CL.
6. We do not find a very significant improvement of the fit in a 3+2 scheme compared to 3+1. Based on the relative χ 2 minima, 3+1 is disfavored with respect to 3+2 at 96.9% CL. The compatibility of appearance and disappearance data in 3+2 is even worse than in 3+1, because the fit of appearance data-only is significantly better in 3+2 than in 3+1, however, the appearance fit at the global best fit point is only marginally improved.
7. We find an improvement of the global fit in the 1+3+1 spectrum compared to 3+1, at the 99.9% CL. The compatibility of appearance and disappearance data is still low in 1+3+1, at the level of 0.2%.
Hence, in all cases we find significant tension in the fit, with the marginal exception of the 1+3+1 scheme. At our 1+3+1 best fit point the minimal value for the sum of all neutrino masses would be Σ ≈ 3 |∆m 2 51 | + |∆m 2 41 | + |∆m 2 51 | ≈ 3.2 eV, where we took the values given in Tab. 8 and assumed that the mass-squared difference with the smaller absolute value is negative, using the symmetry 4 ↔ 5 and γ αβ → −γ αβ of SBL data, see Eqs. (2.1) and (2.3). It remains an interesting question whether such a large value of Σ is consistent with cosmology, see e.g. [47,52,53,58,59].
Let us briefly compare our results to two other recent global sterile neutrino fits, from Refs. [34] and [47]. We are in good agreement with the results of [34]. For instance, in Tab. 2 of [34] χ 2 PG values for the consistency of appearance and disappearance data are given, 17.8 for 3+1 and 23.9 for 3+2, which compare well with our numbers from Tab. 7, 18.0 and 25.8, respectively. There is some disagreement with the results of [47]. The corresponding χ 2 PG values reported in their Tab. I are 6.6 and 11.12, which lead to significantly better compatibility of appearance and disappearance data. Comparing Fig. 1 of [47] with our Fig. 8 (left) we observe that our disappearance limits are somewhat stronger and our appearance region is at somewhat larger mixing angles, both effects increasing the tension. Our appearance region is in good agreement with Fig. 6 (left) of [34]. There are some differences between our disappearance region and Fig. 6 (right) of [34], mainly at high ∆m 2 41 . Irrespective of the hints for ( -) ν e disappearance and ( -) ν µ → ( -) ν e appearance, we have derived constraints on the mixing of eV-scale states with the τ -neutrino flavor. Those are dominated by data involving information from neutral-current interactions, which are solar neutrino data (NC matter effect and SNO NC data), MINOS long-baseline NC data, and atmospheric neutrino data (NC matter effect). The global limit is dominated by the latter.
In conclusion, establishing sterile neutrinos at the eV-scale would be a major discovery of physics beyond the Standard Model. At present a consistent interpretation of all data indicating the possible presence of eV-scale neutrino mass states remains difficult. The global fit suffers from tension between different data sets. An unambiguous solution to this problem is urgently needed. We are looking forward to future data on oscillations at the eV 2 scale [38], as well as new input from cosmology.

Acknowledgments
Numerical results presented in this paper have been obtained on computing infrastructure provided by Fermi National Accelerator Laboratory and by Max Planck Institut für Kernphysik. The authors would like to thank the MINOS collaboration for their invaluable help in including their sterile neutrino search in this work. We are especially grateful to Alexandre Sousa and Mary Bishai for sharing their Monte Carlo results. We are grateful to M. Smy for providing assistance on the simulation of SK4 solar data, and to Bill Louis for valuable information on the MiniBooNE analysis.

A Complex phases in sterile neutrino oscillations
In this appendix we discuss in some detail the phases for neutrino oscillations involving s extra sterile neutrino states. For definiteness, we will focus on s = 2; the special case of s = 1 can be easily obtained by dropping all terms containing a redundant "5" index. Let us order the flavor eigenstates as (ν e , ν µ , ν τ , ν s 1 , ν s 2 ) and introduce the following parametrization for the n × n mixing matrix, with n = 3 + s: where V ij represents a complex rotation by an angle θ ij and a phase ϕ ij in the ij plane. Note that rotations involving only sterile states (i.e., V with both , ≥ 4) are unphysical, and therefore we have omitted them from Eq. (A.1). Removing those unphysical angles, U contains n(n − 1)/2 − s(s − 1)/2 = 3(s + 1) physical angles. In Eq. (A.1) we have chosen a priori all rotations to be complex. We present now a method which allows to remove unphysical phases from the mixing matrix in a consistent way. First, we note that a complex rotation can be written as where O ij is a real rotation matrix, D k is a diagonal matrix with (D k ) jj = e iϕ for j = k and (D k ) jj = 1 for j = k. Depending on whether k = i or k = j, the phase in D k is either ±ϕ ij . Second, we note that phase matrices D k at the very left or right of the matrix U drop out of oscillation probabilities and are therefore unphysical. 10 Hence, we have to represent all matrices V ij in Eq. (A.1) using Eq. (A.2), and then try to commute as many phase matrices to the left and the right. The matrix D k commutes with a matrix O ij if k = i and k = j. Furthermore, if k = i or k = j we can commute D k with a complex matrix V ij by re-defining the phase ϕ ij : e.g., V ij (θ ij , ϕ ij )D i = D i V ij (θ ij , ϕ ij ). However, we cannot commute D k with a real matrix O ij if k = i or k = j. This leads to the following rule for removing phases. Let us start by removing one phase, let's take for instance ϕ 12 , obtaining a real V 12 → O 12 . Then we can no longer use the matrices D 1 and D 2 to remove phases, since we cannot commute them with O 12 to the very left or right of U . But, we can use for instance D 3 to remove one of the remaining phases ϕ i3 , and so forth. Hence, we can remove in total n − 1 phases. Starting with all 3(s + 1) physical angles complex, we obtain that there are 3(s + 1) − (n − 1) = 2s + 1 physical phases, i.e., 1 phase for no sterile neutrinos, 3 phases for the 3+1 spectrum, and 5 phases for the 3+2 spectrum. Those remaining phases cannot be associated arbitrarily to the V ij but only in a way which is consistent with the above prescription to remove phases. In particular, it is not possible to make simultaneously three rotation matrices ij, ik, kj real. One possible choice is the one given in Eq. (2.6). Using this recipe to remove phases it is also straightforward to obtain the physical phases in case of the SBL or LBL approximations according to Tab. 2.
In the SBL approximation for a 3+2 scheme, only two physical phases remain. In the parametrization invariant notation from Eqs. (2.1) and (2.2), they are given by γ µe and γ µτ . Since the only SBL appearance experiments we consider are studying the ( -) ν µ → ( -) ν e oscillation channels only the phase γ µe is relevant for our analysis. In the specific parametrization from Table 2, the physical phases have been chosen as ϕ 25 and ϕ 35 . Since ϕ 35 does not appear in the parametrization independent representation of γ µe according to Eq. (2.2) we can remove it from our SBL analysis without loss of generality.
In the LBL limit, more phases are phenomenologically relevant. In particular, Eq. (2.4) shows that the oscillation probabilities in the 3+2 case are sensitive to the parametrization independent phases with I αβij defined in Eq. (2.2). The experiments for which the LBL approximation is relevant are ICARUS and MINOS. ICARUS searches for ν µ → ν e transitions, whereas the NC data in MINOS are sensitive to the combination α=e,µ,τ P νµ→να . Therefore, for our analyses the two appearance channels (αβ) = (µe) and (µτ ) are relevant, leading, according to Eq. (A.3), to four independent phases, in agreement with Tab. 2. 11 The particular parametrization from the table implies that for the ν µ → ν e channel only the phases ϕ 13 and ϕ 25 are relevant, whereas the ν µ → ν τ channel is also sensitive to ϕ 35 and ϕ 34 . From the way we have chosen the complex rotations in Tab. 2 the correct phases in the 3+1 case are automatically obtained by dropping all rotations including the index "5" in the 3+2 mixing matrix. We recover the well-known result that in the SBL approximation in a 3+1 scenario no complex phase appears. In the LBL approximation two phases remain, corresponding to the combinations arg(U * µ4 U e4 U µ3 U * e3 ) and arg(U * µ4 U τ 4 U µ3 U * τ 3 ), which can be parametrized by using the phases ϕ 34 and ϕ 13 , where for the ν µ → ν e channel only ϕ 13 is relevant.
Let us comment also on the role of phases in solar and atmospheric neutrinos. As shown in appendix C solar neutrinos do depend on one effective complex phase. This is included in our analysis in full generality however the numerical impact of this phase dependence is small. It has been shown in [33] (appendix C) that the impact of complex phases on atmospheric neutrinos is very small and we neglect their effect in the current analysis.

B Systematic uncertainties in the reactor analysis
The correlation of errors between SBL reactors are quite important in order to obtain the significance of the reactor anomaly. Here we describe our error prescription for the SBLR analysis. From the errors quoted in the original publications we extracted the following components. First we removed the uncertainty on the neutrino flux prediction, since we include this uncertainty in a correlated way for all reactor experiments based on 11 In deriving Eq. (2.4) we have assumed that ∆m 2 41 , ∆m 2 51 , ∆m 2 54 are infinite. Note that this assumption does not reduce the number of physical phases further, since also the general procedure used in Tab. 2 (assuming only ∆m 2 21 = 0) leads to the same number of physical phases as Eq. (2.4).
the prescription given in [21] (see below). The remaining error is divided into uncorrelated errors (including statistical as well as experimental contributions) as well as correlated errors between some SBLR measurements. The total uncorrelated error is shown in the last column of Tab. 3. Below we give details on our assumptions on correlations. The total error on the measured cross section per fission in Bugey4 is 1.38% [66]. It receives contributions which are reactor/site specific (1.09%) as well as detector specific (0.84%). Rovno91 [67] used the same detector as Bugey4. The errors on the experimental cross section comes from the reactor and geometry (2.1%) and the latter from the detector (1.8%). So the first one should be uncorrelated whereas the second one should be correlated with the corresponding one from Bugey4. Hence we have σ uncor Bugey4 = 1.09%, σ cor Bugey4/Rovno91 = 0.84%, σ uncor Rovno91 = 2.1%, σ cor Rovno91/Bugey4 = 1.8%. The Bugey3 measurement consists of 3 detectors at the distances 15 m, 40 m, 95 m. In Tab. 9 of [68] systematic errors of 5% (absolute) and 2% (relative) are quoted. The uncorrelated errors given in our Tab. 3 are obtained by adding the statistical error (Tab. 10 of [68]) to the 2% relative systematic error. For the correlated error we remove the relative systematic error as well as 2.4% for the flux prediction and obtain σ cor Bugey3 = 3.9%, which we take fully correlated between the 3 rate measurements. In cases when we include the spectral data from Bugey3 we use 2% (3.9%) as uncorrelated (correlated) normalization errors for the three spectra. Details of our spectral analysis of Bugey3 can be found in [79].
In Goesgen the same detector was used at three different distances. In Tab. V of [69] the individual and correlated errors are given. The values for the uncorrelated errors used in our analysis (see Tab. 3) are obtained by adding the statistical and uncorrelated systematic errors in quadrature and expressed in percentage of the ratio. Then [69] quotes a correlated error of 6%, which includes 3% from the neutrino spectrum, 2% from the cross section, 3.8% from efficiency, 2% from reactor power, and a few more < 1%. We remove the 3% neutrino spectrum, as well as the 2% from cross section (this seems way too large). This gives σ cor Goesgen = 4.8%. Part of this error is supposed to be correlated with ILL, since they used a "nearly identical" detector. Removing the reactor power of 2% we get σ cor Goesgen/ILL = 4.36%. In the ILL paper [70] errors of 3.66% statistical and 11.5% systematical are quoted. The contributions to the systematic error are given as 6.5% on the "intensity of the anti-neutrino energy spectrum", 8% detection efficiency, 1.2% neutron life time and some other smaller contributions. In the lack of detailed information we proceed as follows. We remove 3% for the flux uncertainties (the same as in Goesgen) and take 8% (the detection efficiency) to be correlated with Goesgen. This gives σ uncor ILL = 8.52% and σ cor ILL/Goesgen = 8%, where the uncorrelated error includes also the statistical one. We have checked that other "reasonable" assumptions on the ILL/Goesgen correlation do not change our results significantly.
From Krasnoyarsk [71,72] there are three data points based on a single detector, which records events from 2 "identical" reactors. In [71] from 1987, results at distances of 32.8 m and 92.3 m are reported. The statistical errors are 3.55% and 19.8%, respectively, and the systematical error are 4.84% and 4.76%, respectively, which include detector effects (∼ 3%), reactor power (∼ 3%) and the effective distance (∼ 1%). We take systematical errors fully correlated between those two data points. Then there is a measurement from 1994 [72] at 57 m. The errors include detector uncertainty (3.4%), reactor power (2.5%), and statistics (0.95%). We assume the detector error to be correlated with the 1987 data points but include the reactor power in the uncorrelated error.
For SRP [73] measurements at the distances of 18 m and 24 m are reported from the same detector, which has been moved between the two positions. The obtained ratios of data over expectation at the two distances are 98.7% ± 0.6%(stat.) ± 3.7%(syst.) and 105.5% ± 1.0%(stat.) ± 3.7%(syst.). The uncorrelated systematic error is derived from the ratio of the two spectra, 1.61 ± 0.02(stat) ± 0.03(syst), with an expectation of 1.73 [73]. Hence 1.86% = 0.03/1.61 is an uncorrelated systematic error. Then we remove the 2.5% from the neutrino spectrum from the systematical error and obtain σ uncor SRP1 = 1.95%, σ uncor SRP2 = 2.11%, and σ cor SRP = 2.0%. With this assumption on the uncorrelated errors the two data points are consistent at about 2.4σ.
Rovno88 [74] reports 5 measurements with two different detectors: 1I, 2I, 1S, 2S, 3S, where the "I" experiments use an integral neutron detector, whereas the "S" experiments use a scintillation detector measuring the positron spectrum. In Tab. III of [74] for each measurement two systematical errors are given, 2.2% for "the uncertainty in the measured reactor power and the geometric uncertainty", and a second uncertainty due to "errors in the detector characteristics and fluctuations". From Tab. II one finds that statistical errors are negligible. In the absence of detailed information we assume the 2.2% uncertainty fully correlated among all experiments. From the second error we assume that half of it is uncorrelated and the other half is correlated among detectors of the same type. We have checked that our results do not depend significantly on those assumptions.
Finally let us comment on the uncertainty on the neutrino flux predictions. As mentioned above this uncertainty has been removed from the SBLR experimental errors since they are treated in a correlated way for all reactor experiments. For the uncertainties of the fluxes from 235 U, 239 Pu, 241 Pu we use the information from tables provided in [21]. The uncertainty is provided as uncorrelated error in each bin of neutrino energy as well as fully correlated (between energy bins as well as the three isotopes) errors. For the uncorrelated errors we proceed as follows. We perform a fit of a polynomial of 2nd order to the numbers given in [21]. Then those coefficients are used as pulls in the χ 2 analysis constrained by the covariance matrix obtained from the polynomial fits. This allows us to take into account the fact that the bin-to-bin uncorrelated errors of the neutrino spectrum will lead to correlated effects in the observed positron spectra. Since the uncorrelated flux errors are sub-leading compared to the correlated ones the parametrization with a 2nd order polynomial is sufficiently accurate. To include the correlated errors we follow [21]: the various contributions to this error in each neutrino energy bin are symmetrized and added in quadrature. Then we obtain an energy dependent fully correlated error for the spectra from 235 U, 239 Pu, 241 Pu which is included as one common pull parameter in the global reactor χ 2 . For the neutrinos from 238 U we use the flux from [20] and include a global normalization error on the 238 U induced events of 8.15% [26].

C Solar neutrino analysis
In the analysis of solar neutrino experiments we include the total rates from the radio chemical experiments Chlorine [126], GALLEX/GNO [85] and SAGE [127]. Regarding real-time experiments, we include the electron scattering energy-zenith angle spectrum data from all the Super-Kamiokande phases I-IV [128][129][130][131] and the data from the three phases of SNO [132][133][134], including the results on the low energy threshold analysis of the combined SNO phases I-III [135]. We also include the main set of the 740.7 days of Borexino data [136] as well as their high-energy spectrum from 246 live days [137]. In total the solar neutrino data used in our analysis consists of 261 data points.
Let us now focus on the probabilities relevant for the analysis of solar neutrino experiments. We will assume that only the first two mass eigenstates are dynamical, while the others are taken to be infinite. Since physical quantities have to be independent of the parameterization of the mixing matrix, we will use the freedom in choosing a parameterization that makes analytical expressions particularly simple. We start from the Hamiltonian in the flavor basis: where ∆ = diag(0, ∆m 2 21 , ∆m 2 31 , . . . )/2E and V = √ 2 G F diag(2N e , 0, 0, N n , . . . )/2. It is convenient to write U =Ũ U 12 , where U 12 is a complex rotation by an angle θ 12 and a phase δ 12 which we will define later. 12 Then we can write: In order to further simplify the analysis, let us now assume that all the mass-squared differences involving the "heavy" states ν h with h ≥ 3 can be considered as infinite: ∆m 2 hl → ∞ and ∆m 2 hh → ∞ for any l = 1, 2 and h, h ≥ 3. In leading order, the matrixH takes the effective block-diagonal form:H where H (2) is the 2 × 2 sub-matrix ofH corresponding to the first and second neutrino states, and ∆ (s) = diag(∆m 2 31 , ∆m 2 41 , . . . )/2E is a diagonal (s + 1) × (s + 1) matrix (the matter terms in this block are negligible in the limit of very large ∆m 2 hh ). Consequently, the evolution matrix is: We are interested only in the elements S αe . It is convenient to define θ 12 in such a way thatŨ e2 = 0. Taking into account the block-diagonal form ofS, we obtain: The expressions for the probabilities, P αe = |S αe | 2 , are straightforward:

S
(2) 21 (C.6) Here we have used the fact that the terms containing a factor e −i∆ hh L oscillate very fast, and therefore vanish once the finite energy resolution of the detector is taken into account. For solar neutrino experiments we only need P ee and P ae ≡ P ee + P µe + P τ e . It is therefore convenient to define δ 12 in such a way that σ=stŨ σ1Ũ σ2 is a real number. Let us also define: Using unitarity relations, α |Ũ αi | 2 = 1 and αŨ α1Ũ α2 = 0, we obtain: where In the above expressions P (2) osc and P (2) int are effective terms derived from the Hamiltonian H (2) , which has the form: with the vacuum term including the phase δ 12 :
To perform the analysis we map the parameters which are used for the analysis into the effective parameters for solar neutrinos. In the case of 3+1 the number of real mixing parameters is actually the same as the number of mixing angles in the most general parametrization of the 4 × 4 mixing matrix: the six parameters in Eq. (C.12) are a function of the six angles θ 12 , θ 13 , θ 14 , θ 23 , θ 24 , θ 34 . The dependence of solar neutrinos on θ 13 , θ 14 is shown in Fig. 3 and the one on θ 24 , θ 34 follows from Fig. 5. The dependence on θ 23 is important for the NC matter effect and SNO NC data. For s > 1 the number of effective mixing parameters of solar data is less than the number of angles in the general mixing matrix. The phase δ 12 is a complicated function of complex phases and angles. We have verified numerically for the 3+1 case that if all angles are non-zero (and θ 24 , θ 34 relatively large) the χ 2 from solar data varies by about 1 to 2 units as a function of the phase. Once all relevant constraints on the mixing angles are imposed the effect of the phase on solar data is negligible. See also [138] for a discussion of complex phases in solar neutrinos in the context of sterile neutrinos.

D Atmospheric neutrino analysis
The analysis of atmospheric data follows closely the one presented in Refs. [139,140] and includes the Super-Kamiokande results from phases I, II and III [141] (80 data points in total). Technical details on our χ 2 fit can be found in the appendix of [142].
In order to derive suitable expressions for the relevant probabilities, we can follow the approach presented in App. C for solar data. The Hamiltonian in the flavor basis is given by Eq. (C.1), which in the mass basis becomes: As before, we can simplify the analysis by assuming that all the mass-squared differences involving the "heavy" states ν h with h ≥ 4 can be considered as infinite: ∆m 2 hi → ∞ and ∆m 2 hh → ∞ for any i = 1, 2, 3 and h, h ≥ 4. In leading order, the matrixĤ takes the effective block-diagonal form:Ĥ whereĤ (3) is the 3×3 sub-matrix ofĤ corresponding to the first, second and third neutrino states, and ∆ (s) = diag(∆m 2 41 , ∆m 2 51 , . . . )/2E ν is a diagonal s×s matrix (the matter terms in this block are negligible in the limit of very large ∆m 2 hh ). We are interested only in the probabilities P αβ with α, β ∈ {e, µ, τ }. Taking into account the block-diagonal form ofĤ, we obtain: whereŜ (3) = Evol Ĥ (3) and U (3) is the 3 × 3 sub-matrix corresponding to the first three lines and columns of U , and as such it is not a unitary matrix.
Let us now focus on the three-neutrino system described byĤ (3) (3) . The matter termV (3) contains both the "standard" contribution from ν e charged-current interactions, and a "non-standard" part induced by the absence of neutral-current interactions for sterile neutrinos. In practical terms, this is just the same as the problem of neutrino propagation in the presence of non-standard neutrino-matter interactions (NSI) described in Ref. [140]. Following the approach discussed there, we make a number of simplifying assumptions: • we assume that the neutron-to-electron density ratio, R ne , is constant all over the Earth. We set R ne = 1.051 as inferred from the PREM model [143]; • we set ∆m 2 21 = 0, thus forcing the vacuum term ∆ (3) to have two degenerate eigenvalues; • we impose that the matter termV (3) also has two degenerate eigenvalues.
The first two assumptions are very well known hand have been discussed in detail in the literature. For example, in Sec. 5.2 of Ref. [142] it was noted that the different chemical composition of the Earth mantle and core has very little impact on NSI results. The last approximation is adopted here for purely practical reasons, since (together with the other two) it allows to greatly simplify the calculation of the neutrino evolution [140,144]. Unfortunately, in the present context the matter termV (3) cannot be fixed a priori, but it arises as an effective quantity determined by the angles and phases of the mixing matrix U . Finding the points in the general parameter space for whichV (3) has two degenerate eigenvalues (the only points for which our numerical analysis is technically feasible) is not an easy task. To solve this problem, we have considered here two alternative cases, both based on physically motivated scenarios: (a) decouple the electron flavor from the evolution and include the NC matter effect; (b) allow the electron flavor to participate in oscillations but neglect the NC matter effect.
Option (a) requires to set U ei = 0 for i ≥ 3 (in addition to ∆m 2 21 = 0), whereas option (b) is equivalent to setting R ne = 0 ("hydrogen-Earth" model). Both approximations have been previously discussed in App. C of Ref. [33]. Although we know that none of these options corresponds to Nature, we can make a sensible choice of when it is safe to use each of them. It turns out that for constraining the mixing of the ν µ with eV-scale neutrinos the NC matter effect plays no role at all (see discussion in App. C2 of [33]), whereas the participation of the electron flavor may have some impact. 13 Therefore, whenever we are mainly interested in constraining |U µh | (h ≥ 4), as in the case of the global analysis combined with SBL data, we adopt assumption (b). This is important since non-zero |U ei | leads to slightly relaxed constraints on |U µ4 |, although the effect is small once external constraints on |U ei | are taken into account. With respect to our former analysis presented in Ref. [33], the explicit NSI formalism adopted here is more general since it allows to fully include |U ei |-related effects without further approximations.
On the other hand, when exploring the sensitivity to the fraction of sterile neutrinos participating in atmospheric and long-baseline neutrino oscillations, the contribution of neutral-current neutrino-matter interactions is essential. Indeed, under approximation (b) no limit on |U τ i | (i ≥ 4) would be obtained. Therefore, when exploring constraints on |U τ i | we adopt assumption (a) above. This is relevant for Figs. 4 (right) and 5.

E Technical details on the simulation of SBL and LBL experiments
Here we provide technical details on the simulation of some of the experiments included in our fit. All the simulations described in this appendix make use of the GLoBES software package [145,146].
Both LSND and KARMEN have measured the reaction ν e + 12 C → e − + 12 N, where the 12 N decays back to 12 C + e + + ν e with a lifetime of 15.9 ms. By detecting the electron from the first reaction, which has a Q-value of 17.33 MeV, one can infer the neutrino energy.
Here we describe how we use these data to constrain ν e disappearance [94,95].
For KARMEN, we use the information from the thesis [94], which uses more exposure than the original publication [92]. The number of expected events can be calculated by multiplying the 12 C cross section (Fukugita et al. [93]: 9.2±1.1×10 −42 cm 2 ), the number of target nuclei (2.54×10 30 ), the absolute neutrino flux (5.23×10 21 ), the efficiency (27.2%, flat in energy), and the inverse effective scaled area (1/[4π(17.72m) 2 ]). 846 neutrino candidates are observed, with an expected background of 13.9 ± 0.7, which are mainly accidentals and cosmic induced. The systematic errors are dominated by a 6.7% uncertainty in the absolute neutrino flux and 3% in the Monte Carlo efficiency. The total systematic error is 7.5% plus a 12% cross section error. We take the latter to be correlated between the KARMEN and LSND ν e -carbon analyses. In fig. 3.2 (upper panel) of [94], the data are shown as the energy distribution of the prompt e − spectrum in 26 bins where the visible prompt electron energy is within the range 10 MeV < E e < 36 MeV. Modulo energy reconstruction the neutrino and electron energies would be related by the Q-value as E ν = E e + Q. For the simulation we assume 30 MeV < E ν < 56 MeV. To properly fit the data, we assume σ e = 25%/ E (MeV) for the energy resolution. With the 26 data points we obtain a two-neutrino best fit with χ 2 min /dof = 30/24. For LSND [91] we compute the expected number of events by multiplying the 12 C cross section (9.2 × 10 −42 cm 2 [93]), the number of target nuclei (3.34 × 10 30 ), the neutrino flux at the detector (10.58 × 10 13 cm −2 ), and the efficiency (23.2%, flat in energy). 733 neutrino candidates are observed, with a negligible expected background. The systematic error is dominated by a 7% uncertainty in the neutrino flux and a 6% uncertainty in the effective fiducial volume. The total systematic error, not including the theoretical cross section error, is 9.9%. The 12% cross section error is correlated between the LSND and KARMEN ν e -12 C analyses. In fig. 6 of Ref. [91], the data are shown as the energy distribution of the prompt e − spectrum, where the visible prompt electron energy is within the range 18 MeV < E e < 42 MeV and divided into 12 bins of width 2 MeV. In terms of the neutrino energy E ν , this energy range corresponds to 35.3 MeV < E ν < 59.3 MeV. In our analysis, we combine the 12 energy bins into only 6 bins. To properly fit the data, we assume σ e = 2.7 MeV for the energy resolution. With the 6 data points we obtain a two-neutrino best fit with χ 2 min /dof = 3.81/4. For the combined KARMEN+LSND ν e -carbon fit, we have 32 bins and we find a two-neutrino best fit point with χ 2 min /dof = 34.17/30.

E.2 E776
The pion beam experiment E776 at Brookhaven [40] employed a 230 ton calorimeter detector located at approximately 1 km from the end of the 50 m long pion decay pipe. The ( -) ν e energy of order GeV was measured with an energy resolution of 20%/ E [GeV]. E776 used ( -) ν µ disappearance data to obtain the overall normalization of the neutrino flux. In our fit, we do not explicitly include ( -) ν µ data, but instead use the normalization as an input. The main backgrounds in E776 came from intrinsic ( -) ν e contamination in the beam and from π 0 's produced in neutral current interactions and misidentified as electrons. The systematic errors were 11% for the intrinsic background and 27% (39%) for the π 0 background in neutrino (anti-neutrino) mode. In 1986, E776 collected 1.43 × 10 19 (1.55 × 10 19 ) protons on target, and a total of 136 (56) ν e (ν e ) candidate events were observed with an expected background of 131 (62) events for neutrino (anti-neutrino) mode. E776 present the observed and predicted electron energy spectra using 14 equidistant energy bins per polarity, covering the energy range from 0 GeV to 7 GeV. In our fit we omit the first bin and combine the second and third ones because modeling the detection efficiency at these low energies is very difficult. Hence, we have a total of 24 data points. We checked that we are able to reproduce well the exclusion curve shown in fig. 4 of ref. [40] (if we also use a two-flavor oscillation model), and we obtain χ 2 min /dof = 31.08/22 at the best fit point. In the combined analysis with other experiments we take into account oscillations for the ( -) ν e background.

E.3 ICARUS
The ICARUS experiment [147] is a neutrino beam experiment at Gran Sasso. The CNGS facility at CERN shoots 400 GeV protons at a graphite or beryllium target, producing a hadronic shower which is focused by a magnetic horn system. The resulting neutrino beam is mainly composed of ν µ , having only a 2%ν µ contamination and a < 1% intrinsic ν e component. The neutrino spectrum ranges approximately from 0 − 50 GeV, with a wide peak at 10 − 30 GeV. After traveling 732 km, they are detected in the ICARUS T600 detector, a 760 ton liquid argon time projection chamber. Between 2010 and 2012, the ICARUS detector observed 839 neutrino events with energy below 30 GeV, to be compared with the expectation of 627 ν µ and 3 ν τ charged current events, as well as 204 neutral current events. While a Monte Carlo simulation of the experiment predicts 3.7 ν e background events, only two were identified [41]. For our ICARUS simulation, we took the ν µ spectrum from Ref. [148] and folded it with the ν µ → ν e oscillation probability. To assess the limit on ν e appearance, we define the likelihood −2 ln(L) = 2(P − D) + 2D log(D/P ), where D (P ) is the total observed (expected) number of ν e events. Although there is a ∼ 7% systematic error in the selection efficiency, we have checked that its impact on the experiment sensitivity is negligible.

E.4 MiniBooNE
To analyze MiniBooNE data on ( -) ν µ → ( -) ν e oscillations, we use the data presented in [16], corresponding to 6.46 × 10 20 protons on target in neutrino mode and 11.27 × 10 20 protons on target in anti-neutrino mode, the same exposure as used in [17]. We follow the analysis strategy outlined in the supporting on-line documentation [16]. For each set of oscillation parameters we consider, we compute the expected neutrino and anti-neutrino energy spectra by weighting the unoscillated Monte Carlo events from MiniBooNE's data release [16] with the oscillation probabilities. For each simulated event, we take into account the individual neutrino energy and the distance between the neutrino production and detection vertices. We allow both signal and background neutrinos to oscillate including the "wrongsign contamination" of the signal. In computing the log-likelihood for any given parameter point, we take into account statistical and systematic uncertainties by using the covariance matrix published by MiniBooNE [16]. The likelihood is given by where D (P) is a column vector containing the observed (predicted) number of events including background, and S is the covariance matrix (see Ref. [16] for details). Note that this covariance matrix includes correlations between the ( -) ν e and ( -) ν µ event samples as well as between neutrino and anti-neutrino data (if analysed together). There are 11 bins for e-like data and 8 bins for µ-like data, for neutrino and anti-neutrino mode, each.
The ( -) ν µ -data are used to determine the normalization of the beam flux. In principle, we should therefore include also oscillations in the ( -) ν µ disappearance sample. The impact of ( -) ν µ oscillation on the fit to the ( -) ν e data would then be taken into account by the covariance matrix. This is problematic because it would prevent us from combining, without double-counting, our results with the independently obtained ones from MiniBooNE's dedicated ( -) ν µ disappearance searches (see below). Using the ( -) ν µ data from [16] directly for a disappearance analysis is also not possible because the corresponding ( -) ν µ prediction from [16] has been obtained from a Monte Carlo simulation in which some parameters have been tuned to the data assuming no ( -) ν µ disappearance. We have therefore decided to follow the MiniBooNE collaboration and not to include oscillations of ( -) ν µ in the appearance analysis. We have verified numerically that this has a negligible impact on the fit results once external limits on |U µ4 |, |U µ5 | are taken into account.
A further subtlety arises for the kaon-induced backgrounds. In MiniBooNE, these are predicted using information from the SciBooNE detector [149], which operated in the same beam, but at a shorter baseline. To account for this, we rescale the kaon-induced backgrounds in MiniBooNE by the ratio of the oscillation probabilities in MiniBooNE and in SciBooNE. Since kaon-induced backgrounds make only a small contribution to MiniBooNE's total error budget, the effect of this rescaling on the fit results is, however, very small.
Due to the correlation with the muon-like events it is not straight forward to assign a number of degrees of freedom to the appearance search without double counting the muon data, which are used also in the separate disappearance analysis. We have adopted the following prescription. Eq. (E.1) can be written as where (S µµ ) −1 is the inverse of the µµ sub-block of S. 14 Hence, we have block-diagonalized the covariance matrix. The shift δ corresponds to the impact of the µ-like data on the normalization of the e-like flux. The two terms in Eq. (E.2) should be statistically independent and approximately χ 2 distributed. For the MiniBooNE appearance analysis we therefore use χ 2 MB,app ≡ χ 2 − C, and assign 22 dof to it (for combined neutrino/antineutrino data). The last equality in Eq. (E.3) shows explicitly that C does not depend on oscillation parameters, since we neglect the effect of oscillations on d µ . With this method we obtain GOF values which are in reasonable agreement with the numbers obtained by the collaboration: our results for χ 2 min /dof (GOF) for neutrino, anti-neutrino, combined data are 14.2/9 (11%), 6.5/9 (69%), 32.9/20 (3.5%), respectively, compared to the numbers obtained in [16] 13.2/6.8 (6.1%), 4.8/6.9 (67.5%), 24.7/15.6 (6.7%). Note that in [16] the number of dof and GOF have been determined by explicit Monte Carlo study and also a different energy range has been used to obtain those numbers.
For our MiniBooNE ν µ disappearance analysis, we use the neutrino mode data from [44]. As in appearance mode, we compute the expected event spectra for each parameter point by using MiniBooNE's Monte Carlo events. Since backgrounds are very small for this analysis, we do not need to take them into account. For each set of oscillation parameters, we choose the overall normalization of the spectrum in such a way that the total predicted number of events matches the number of observed events in MiniBooNE, i.e., we fit only the event spectrum, not the normalization. The log-likelihood is obtained in analogy to Eq. (E.1) and thus takes into account systematic uncertainties and correlations between different energy bins.
In the analysis ofν µ disappearance data, we follow the combined MiniBooNE/SciBooNE analysis from [45]. As for the appearance and ν µ disappearance analyses, we use public Monte Carlo data to compute the predicted event spectra. We take into account oscillations of both the signal and the background, and we compute the log-likelihood again in analogy to Eq. (E.1).

E.5 MINOS
Our analysis of MINOS neutral current (NC) and charged current (CC) interaction is based on 7.2 × 10 20 protons on target of NC data presented by the collaboration in Ref. [150] (see also [42,43]) and 7.25 × 10 20 protons on target of CC data published in [151]. All data was recorded in neutrino mode, i.e., the beam consists mostly of ν µ and only small contaminations ofν µ , ν e andν e .
We have implemented the properties of the NuMI beam and the MINOS detector within the GLoBES framework [145,146], using results from the full MINOS Monte Carlo simulations as input wherever possible. In particular, we use tabulated Monte Carlo events [152] to construct the detector response functions R ND (E true , E rec ) and R FD (E true , E rec ) for neutral current events in the near detector (ND) and the far detector (FD), respectively. R ND and R FD describe the probability for a neutrino with true energy E true to yield an event with reconstructed energy E reco . We include the CC ν µ and CC ν e backgrounds to the NC event sample (beam intrinsic as well as oscillation induced), as well as the small NC background to the CC ν µ event sample. The number of charged current interactions is predicted using the simulated NuMI flux [153], the cross sections calculated in [154][155][156], and a Gaussian energy resolution function with width σ CC E /E true = 0.1/ E true /GeV for the CC event sample and σ CC-bg E /E true = 0.16 + 0.07/ E true /GeV for the CC background in the NC event sample. The parameters of the energy resolution function, as well as the efficiencies, have been tuned in order to optimally reproduce the unoscillated event rates predicted by the MINOS Monte Carlo. We emulate the baseline uncertainty due to the non-vanishing length of the MINOS decay pipe by smearing the oscillation probabilities with an additional Gaussian of widthσ 2 = (2.0 GeV−3.0×E true ) 2 and setting the effective distance between near detector and neutrino source to 700 m. This value as well as the parameters of the smearing function have been obtained numerically from the requirement that 2-flavor oscillation probabilities at the near detector computed with a more accurate treatment of the decay pipe are well reproduced for various mass squared differences of order eV.
We compute neutrino oscillations in MINOS numerically using a full 4-or 5-flavor code that includes all relevant mixing parameters as well as CC and NC matter effects. In the fit, we predict the expected number of events at the far detector for a given set of oscillation parameters by multiplying the observed number of near detector events in each energy bin with the simulated ratio of far and near detector events in that bin. Each event sample is divided into 20 energy bins with a width of 1 GeV each, covering the energy range from 0 to 20 GeV. As systematic uncertainties, we include in the analysis of the NC (CC) sample a 4% (10%) overall normalization uncertainty on the far-to-near ratio, separate 15% (20%) uncertainties in the background normalization at the far and near detectors, a 3% (5%) error on the energy calibration for signal events and a 1% (5%) error on the energy calibration for background events. For the NC analysis, the systematic uncertainties are based on the information given in [42], for the CC analysis they have been tuned in order to reproduce the collaboration's fit with reasonable accuracy, while still remaining very conservative.