A Statistical Analysis of the COHERENT Data and Applications to New Physics

The observation of coherent elastic neutrino nucleus scattering (CE$\nu$NS) by the COHERENT collaboration in 2017 has opened a new window to both test Standard Model predictions at relatively low energies and probe new physics scenarios. Our investigations show, however, that a careful treatment of the statistical methods used to analyze the data is essential to derive correct constraints and bounds on new physics parameters. In this manuscript we perform a detailed analysis of the publicly available COHERENT CsI data making use of all available background data. We point out that due to the low statistics of the CsI data Wilks' theorem is not fulfilled in general and a calculation of the $p$ values and confidence regions via Monte Carlo simulations is necessary. As an example for the necessity of this approach to test new physics scenarios we quantify the allowed ranges for several scenarios with neutrino non-standard interactions. We emphasize the effect of binning which can significantly weaken constraints if incorrectly applied. Furthermore, we provide accompanying code to enable an easy implementation of other new physics scenarios as well as data files of our results.


Introduction
Coherent elastic neutrino nucleus scattering (CEνNS) is a Standard Model (SM) process that was predicted in 1974 [1,2] and it describes the process when a neutrino scatters simultaneously with all of the nucleons in a nuclear target. The individual nucleonic amplitudes sum up coherently which enhances the cross section by the number of nucleons squared. Despite its large cross section, observation of this process is challenging due to the small nuclear recoil energies involved. Nevertheless due to the experimental advances in the last decade in detecting low recoil energies, CEνNS has been observed for the first time in 2017 by the COHERENT collaboration [3]. Since then a large interest in constraining standard and beyond the Standard Model (BSM) physics with this measurement has emerged. The abilities of CEνNS to probe SM parameters at low momentum transfer, test new neutrino interactions, search for sterile neutrinos, its implications for supernova physics, as well as for dark matter searches, constraints on neutrino magnetic moments and nuclear physics show the far reaching influence of this process in many different areas of particle physics . For this reason a careful statistical analysis is crucial to derive robust and reliable constraints on SM and BSM physics scenarios from CEνNS.
In this manuscript we will revisit and go beyond previous analyses to derive statistically robust constraints on SM and BSM parameters using the publicly available data from the COHERENT CsI observation [70]. We will consider the timing and energy information provided in the data release. We find that the test statistic for the CsI data is not distributed according to a χ 2 distribution with the number of degrees of freedom given by the number of energy and timing bins making a calculation of the p value via Monte Carlo (MC) simulations necessary. As an example for a new physics scenario we quantify the allowed ranges for neutrino non-standard interactions (NSI). For the first time in the literature we determine the allowed regions for all five relevant NSI parameters using Monte Carlo (MC) simulations. We also revisit the constraint from COHERENT CsI data on the LMA Dark solution which describes an important degeneracy in neutrino oscillations in the presence of NSI.
Our analysis differs from previous analyses not only by the statistical approach but also in the treatment of the background and signal. Instead of considering only the "anticoincidence beam on" data set as background we additionally consider the beam off data as background to enhance the background statistics. Furthermore, we address the question of ideal number of bins for the example of NSI. Our analysis is implemented in a code which can be downloaded from [71]. We also provide our results in the form of data files which can be downloaded from the same source.
This manuscript is organized as follows: in sec. 2 we will give an overview of the COHERENT experiment and the CEνNS process, in sec. 3 we describe the calculation of the signal and background events at the COHERENT CsI detector. Sec. 4 is devoted to the analysis of the CsI data, in sec. 5 we use the data to derive constraints on the allowed region for NSI parameters in different scenarios, and finally we summarize and conclude in sec. 6.

CEνNS in the Standard Model
CEνNS is a neutral current process which takes place for neutrino energies below about 50 MeV. In the SM it is mediated by the Z boson. The cross section for CEνNS for a neutrino of flavor α is given by [1] where G F is the Fermi constant, E R is the nuclear recoil energy, F (Q 2 ) is the nuclear form factor, M is the mass of the target nucleus, and E ν is the incident neutrino energy. In the kinematic term on the right the last two terms in are suppressed by E R /E ν in comparison to the first terms. The weak charge is given by Here, N and Z are the number of neutrons and protons in the target nucleus. The SM couplings of the Z boson to protons and neutrons at low energies are given by [72] with ρ N C νN = 1.0082,ŝ 2 Z = sin 2 θ W = 0.23129, κ = 0.9972, λ uL = −0.0031, λ dL = −0.0025, λ dR = 2λ uR = 7.5 · 10 −5 . The contribution from the coupling to strange sea quarks in the nucleus is negligible for the process considered here [73]. The nuclear form factor F (Q 2 ) depends on the nuclear density distribution and is related to the physical size of the nucleus. It accounts for loss of coherency at higher values of momentum transfer, for small momentum transfer F ∼ 1. In agreement with the official COHERENT analysis [3] we choose the Klein Nystrand parametrization [74] for the form factor which is given as with a = 0.7 fm, R a = 1.2A 1/3 fm, and ρ = 3A/(4πR 3 a ). Using a different form factor has only a small effect on the number of signal events; nonetheless we will account for the form factor uncertainty in the analysis in sec. 5 by including a systematic uncertainty on the normalization of the signal.
The expected rate of CEνNS events depends on the specifications of the detector considered and the timing and energy structure of the neutrino source. In the following we will focus on the CsI detector of the COHERENT experiment. The COHERENT experiment is located at the Spallation Neutrino Source (SNS) at Oak Ridge National Laboratory. The pulsed source produces π + and π − in proton-nucleus collisions in a mercury target. The π − are absorbed by nuclei before they can decay, the π + lose energy as they propagate and eventually decay at rest into π + → µ + +ν µ , followed by the muon decay µ + → e + +ν µ +ν e . As the muon lifetime is much longer than that of the pion the monochromatic ν µ component (at E νµ = (m 2 π − m 2 µ )/(2m π ) ≈ 29.7 MeV) is referred to as prompt flux, while the delayed neutrino flux from muon decay ν e and ν µ has a continuous energy spectrum up to E νe,νµ < m µ /2 ≈ 52.8 MeV. From simple decay kinematics the incoming neutrino flux for the COHERENT experiment are given as, The expected number of CEνNS events per nuclear recoil energy can by calculated with where N target is the number of target nuclei. The mass of the CsI detector is 14.6 kg. The atomic numbers of the Cs and I nucleus are similar (A I = 127, Z I = 53 and A Cs = 133, Z Cs = 55) nevertheless we calculate the individual cross sections separately and weight the number of events according to the nuclear masses.

NSI Review
In BSM theories other neutral particles can contribute to CEνNS. The effect of new heavy mediators can be parametrized in an effective field theory approach when the mediator mass exceeds the energy transfer Q. For NSI affecting CEνNS at COHERENT this is the case for mediator masses above ∼100 MeV [19]. The framework for such NSI is given by [75].
In the NSI framework neutrinos experience a new, possibly flavor changing, interaction governed by an addition to the Lagrangian, where α and β refer to the flavor indices of the neutrinos, P = P L , P R , and f stands for SM charged fermion typically e, u, or d. In this notation, f αβ provides an effective field theory parametrization of the strength of the new interaction with respect to the Fermi constant. That is, f αβ ∼ O(G X /G F ) with the new physics effective coupling G X . Many interesting UV complete NSI models leading to | f,V αβ | ∼ O(0.1) or larger have been developed in recent years [76][77][78][79][80][81][82][83]. NSI provides useful means of connecting new physics processes involving neutrinos to oscillation physics. In fact, there have been several hints in oscillation data of new physics that could be explained by NSIs | f,V αβ | ∼ 0.01 − 0.1 [77,[84][85][86][87]. For an overview of the breadth of NSI physics, see [88].
In general an axial coupling can be present in neutral current NSI, but the effect is negligible in coherent elastic scattering of neutrinos with heavy nuclei [72] and becomes only important for light targets like Na as the relative contribution depends on the inverse of the number of nucleons assuming comparable axial and vector couplings. As the constraints for scalar interactions are similar to the constraints on vector interactions we will focus only on vector interactions. The weak charge from eq. (2.2) is now replaced by, In principle if the mediator for the interaction is of a similar mass as the energy scale of the experiment, it can provide an energy dependent effect to eq. (2.8). A light mediator can also affect the value of the form factor and the values of g V p , g V n via running effects. We focus only on the heavy mediator case here, where heavy means 100 MeV. We also assume that the new physics is dominantly on the neutrino side and that the quark-mediator coupling is smaller than the neutrino-mediator coupling. Due to a degeneracy in the cross section there are two distinct values of αα which lead to the same cross section, hence we expect two exact degenerate minima in the test statistic when the term inside the square brackets on the first line of eq. (2.8) changes sign.
As the CEνNS process with only one target nuclei is not sensitive to the difference between up and down quarks we consider u,V = d,V (our results can be easily translated to other up to down quark ratios) as in [19]. As COHERENT is not sensitive to τ τ we are left with 5 parameters to constrain ( V ee , V µµ , V eµ , V eτ , V µτ ). The remaining NSI parameter V τ τ can be connected to these parameters via the tight constraint on V τ τ − V µµ from oscillations.

CEνNS at COHERENT CsI
In order to obtain the CEνNS signal events at the COHERENT CsI detector the following procedure needs to be implemented. Eq. (2.6) provides the number of events per nuclear recoil energy. However, the COHERENT CsI detector does not directly measure nuclear recoil energy; instead it records the number of photoelectrons (PE) produced by an event. Then one can relate E R to the number of PE by the quenching factor Q and the light yield Y . The quenching factor accounts for the fact that when the nucleus recoils its energy is dissipated through a combination of scintillation (ionization of the material) and secondary nuclear recoils (heat). The characteristic signal of a nuclear recoil are secondary recoils, however their measurable signal is much smaller than that of electron recoils. The ratio between the light yields from a nuclear and an electron recoil of the same energy is the quenching factor Q. The light yield parametrizes the amount of electron recoil energy which is converted into PE. The light yield is provided in the data release as Y = 13.348 ± 0.019 PE/keVee where keVee is the electron recoil energy in keV [70]. The measurement of the quenching factor is fairly complicated and a number of measurements of Q by several collaborations have been performed. The agreement of the measurements, however, is quite poor. A recent measurement [89] which claimed smaller error bars could not be confirmed by the COHERENT collaboration. For this reason we follow the recommendation of the collaboration [90] and use the quenching factor quoted in the data release which is independent on the nuclear recoil energy and whose error bars encompass the lowest and highest measurement of the quenching factor in the considered region for nuclear recoils. This gives Q = 8.78 ± 1.66% [70]. The uncertainty on both light yield and quenching factor only affect the normalization of the signal and are in secs. 4, 5 taken into account as pull terms on the signal.
It is important to note that the number of PE produced in a given interaction follows a Poisson distribution and an additional smearing must be accounted for to relate the expected number of PE, PE raw to the observed number, PE true . The relation between the number of events in the raw PE bins before smearing and the number of events in the true PE bins after smearing is given by, where the Poisson distribution is The effect of smearing depends on the number of true energy bins considered for the analysis. If only one energy bin is considered (that is, energy information is not included in the analysis) then smearing has no effect. For more energy bins, in particular for all 12 available energy bins, we have found that smearing needs to be included. Smearing also affects the background events. At COHERENT there are two sources of backgrounds 1 . The neutron background comes from the neutrons produced in the beam, their arrival time is the same as the prompt neutrino flux. The steady state background is coming from either cosmic rays or their by-products entering the detector and their arrival time is not related to the beam time.
The COHERENT collaboration released the results of a simulation of the energy (in terms of raw PE bins) and timing distribution of the neutron background. Additionally the collaboration released four data files. The coincidence beam on data (C-ON) which is the "signal" data, the coincidence beam off data (C-OFF) and the anti-coincidence beam on and beam off data (AC-ON and AC-OFF) files as the "background" data. The difference between coincidence and anti-coincidence is determined by two different timing windows where the coincidence window is when the SNS beam arrives (C-OFF is when the beam would arrive if it were on), contributions from the SNS beam are only expected in the coincidence window [3,91]. As the background is expected to be uncorrelated between beam on and off and in particular not related to the coincidence or anti-coincidence regions all three templates should be used as background to increase the background statistics and hence lower the uncertainty on the background normalization. This approach differs from all previous analyses which only used the AC-ON data to estimate the background. All three backgrounds are compatible, as expected (see fig. 1 for a comparison of the 1D projections of the AC-ON data and the rescaled sum of all three backgrounds). We then perform a weighted sum (the beam off data includes 153.5 days of exposure compared to 308.1 days of exposure when the beam was on [3]) to account for the different exposures and refer to this as the steady state background. The steady state background is the largest contribution to the number of events. In fact the number of background events is more than twice as large as the number of signal events making a correct treatment of the background crucial to derive constraints on the signal.
We have confirmed that the timing and energy of the background are uncorrelated, consistent with previous analyses by COHERENT [3,91]. For this reason it is possible to factorize the background data producing 1D projections in energy and time to reduce the effect of statistical fluctuations in the background model. The full 2D template can then be obtained by multiplying these two distributions. We follow this approach after accounting for the cuts in energy and time which are 7 ≤ PE true ≤ 30 and t ≤ 6 µs from the beginning of the pulse.
Smearing needs to be applied to the sum of signal and background events according to eq. (3.1). However the steady state background is only provided as measured events in true PE bins and not as events in raw PE bins. In order to circumvent this problem and to obtain self consistent results, one could attempt to invert the effect of smearing and efficiency on the background and infer the underlying functional dependence by considering various functional forms for the true background and then applying smearing and efficiency and comparing to the data. After trying many functional forms we found, however, that none provided an acceptable fit to the data.
One could also attempt to reconstruct the entire 2D distribution spanning 144 bins and accounting for the detector efficiency and smearing and then fitting to the measured background data with 144 degrees of freedom. We found this to be numerically unreliable due to the considerable fluctuations in the background data. In addition, any attempt to simply parametrize the 2D background proved to be infeasible without a very large number of parameters in the fit which likely suffers from over fitting. Finally, the features in the data, in particular the fact that the timing distribution is not flat, don't have a clearly motivated physics interpretation and simple functional fits do not provide a good fit to the timing distribution 2 . For this reason we use the method described above of projecting the data down to 1D to determine the timing and energy distributions, and then projecting back to the full 2D distribution.
As we will see numerically in the following, the low statistics of the current CsI data effectively dictates that the largest exclusion power comes when using the smallest number of bins such that there is still the maximum amount of information about the physics in question. If only one energy bin is used, then smearing and an analytic fit to the energy and timing bins is not necessary. However as statistics increase it may become preferential to include more energy bins since the different flavors have slightly different energy spectra in the SM. In addition, models that predict energy dependent phenomenon such as light new physics particles which mediate NSI or other dark sector searches, using energy information can provide a powerful probe and thus using more than one energy bin will likely provide tighter constraints in those cases.
Finally, the signal acceptance efficiency C of the CsI detector need to be taken into account for the signal. An analytic parametrization in terms of PE true is provided in [70], The integral over eq. (2.6) in raw PE space gives the energy distribution of the signal. The timing distribution of the signal has been provided by the collaboration [70]. Multiplying the two distributions gives the 2D timing-energy distribution. As the neutron background has been provided in terms of raw PE, it can be added to the signal such that smearing and efficiency can be applied to both distributions simultaneously. Adding to this the steady state background distribution leads to the 2D distribution of expected events. The uncertainties in the efficiency parameters in eq. (3.4) have a small effect on our results; nonetheless, we include the impact of the uncertainty on a into our normalization uncertainty.

Test statistic
With the approach described in the previous section we calculate the p value of the SM using a test statistic (TS). After applying the cuts on the released data files the considered energy bins range from [7,30] PE true and timing between between [0,6] µs.
We use Poisson statistics to calculate the log likelihood ratio in the i th timing and j th energy bin and then sum over each bin and minimize over the nuisance parameters (pull terms) where D ij is for the number of events taken from the C-ON data set and T ij stands for the theoretically predicted number of events which is calculated as the sum of the steady state background events (N bkg ij ) and the neutron background (N neut ij ) plus signal (N sig ij ) after their raw PE distribution has been smeared according to a Poisson distribution and the efficiency C(PE true ) has been applied where with P the Poisson distribution from eq. (3.2) multiplied by We include pull terms in the likelihood function to account for the uncertainty of the signal normalization α, σ α , coming from the uncertainty on the quenching factor (25%) (we prefer to include a larger uncertainty than quoted due to the disagreement between different measurements of Q [3,89,[91][92][93][94]), uncertainties on the neutrino flux (10%) 3 , light yield (0.14%), signal acceptance (5%), and form factor (5%) [3] added in quadrature. The uncertainty on the normalization of the neutron background β is σ β = 25% [3]. We calculate the uncertainty on the state steady background normalization γ by using the square root of the average number of events per bin in the timing and energy distributions of the sum of the three (C-OFF, AC-ON, AC-OFF) unweighted background distributions and add them in quadrature. A Gaussian prior is typically assumed for these pull terms. However, when the true uncertainty on a given systematic is dominated by Poisson fluctuations in another measurement, is otherwise asymmetric, or when the pull term is getting pulled to ∼ ±1, a Gaussian pull term is typically no longer appropriate. This means that for a small numbers of events, the likelihood function is usually skewed, resulting in asymmetric error intervals and pull distributions that are non-Gaussian [96]. In order to obtain the correct pull term distribution for the CsI data insights into the calculation of the uncertainties (for example the calculation of the uncertainty on the quenching factor or the MC calculation of the neutron background) is required. This information has not been published. As an example, it can be already seen from eq. (3.4) that the error on the normalization of the signal efficiency is asymmetric. Using a symmetric pull term can lead to unphysical results for this reason it is desirable that the collaboration provides the correct pull term parametrization. Nonetheless, we find that the impact of assuming a Gaussian pull terms is small in the optimal binning configurations, see appendix A for further validation of this.

Binning
Due to the low statistics of the signal, it is desirable to reduce the fluctuations as much as possible. One way to do so is to bin the data such that the number of events per each bin are rather large as Poisson fluctuations scale like the square root of the number of events per bin. However choosing too large bins can decrease the sensitivity to SM effects or to new physics significantly, in particular if the new physics contribution predicts a certain energy or timing behavior different from the SM. Hence the question of the ideal number of bins arises. In general this question depends on the model to test. To demonstrate the impact of different number of bins we present our results for various different binning configurations: 1T,1E: a counting experiment, 2T,1E: two timing bins [0,1] and [1,6] µs splitting the signal into the prompt and delayed components (see eq. (2.5)), 2T,4E: in addition to the two timing bins in 2T,1E, we also bin the energy data into [7,18], [19,22], [23,26], and [27,30] PE, for eight total bins, In general the boundaries of the bins need to be adapted according to the specific model to test. As we are interested in flavor specific NSI effects choosing the timing bins such that they contain the different neutrino flavors improves the constraints. In addition, as we focus on NSI with heavy mediators (with mass above 100 MeV) which do not affect the energy distribution of the signal, we find that including many energy bins does not lead to more information. Instead splitting the number of events among several energy bins increases the sensitivity to fluctuations in the data. Hence we will show that the 2T,1E configuration leads to the tightest constraints for heavy NSI. For different models such as NSI with lighter mediators (see e.g. [8,19]) or sterile neutrinos (see e.g. [47]) one would anticipate that additional energy bins would be helpful. In principle, such an optimization should be done before data is collected for a given new physics scenario.

Results
In this section we demonstrate our improved statistical approach to obtain constraints. We first discuss the goodness of fit of the SM and then a collection of non-standard interactions scenarios.

Goodness of fit
The first thing we check is if the SM is a good fit to the data. We performed a MC simulation of COHERENT to calculate the p value of the SM using the different bin configurations. The results are shown in tab. 1. We find that the SM is a good fit to the data independent of the number of bins used for the analysis.

NSI parameters one-at-a-time
In this section we will focus on only one non-zero NSI parameter; the next section covers the case where all relevant NSI parameters are allowed to be non-zero at the same time. We again perform a MC simulation assuming the best fit point and calculate the p value for each value of V ee for various different binning configurations in fig. 2. In the simulation we used the best fit NSI value of the underlying model as signal and varied the signal normalization and neutron background normalization pull terms according to a normal distribution. We then applied Poisson fluctuations to the sum of signal and backgrounds in the individual bins. We then calculated the test statistic using this distribution as observed data to obtain a histogram of test static. The p value is calculated as the fraction of test static which are larger than the one calculated using the observed CsI data.
From fig. 2 we see that the 2T,1E analysis leads to the best constraints. It should be noted that constraints from the analysis which uses 144 bins flatten out for large NSI values which leads to V ee ∼ O(0.1) to be only excluded at the 1σ level and even larger NSI values to be allowed within 2σ. These results are not physical and come from the fact that for large NSI values the pull term for the signal normalization is close to -1 which drastically reduces the number of signal events however only increases the value of the TS by maximally (−1/0.28) 2 which is a small change compared to the contribution to the TS from the individual bins which sum up to around 150. One possibility to restore the physicality of the results is to introduce asymmetric pull terms which are steeper for negative values of the normalization. In appendix A we present an alternative parametrization for the pull terms and derive the resultant constraints. Nevertheless also in this case two timing and one energy bin lead to the best constraints. Fig. 2 shows another important feature of the CEνNS data, namely the limitations of Wilks' theorem. A common source of error in statistical analysis is to assume that the test statistic follows a χ 2 distribution with mean equal to the degrees of freedom (dof's) according to Wilks' theorem [97]. This is often not correct and can lead to significantly incorrect results [98][99][100], as it has been pointed out recently in the context of short baseline neutrino oscillations [101][102][103][104][105][106]. In fact for low number of events per bin the test statistic is not expected to follow a χ 2 distribution as in this case the necessary condition to fulfill Wilks' theorem that enough data is observed is not met (see for example [100]). In order to overcome this problem it is necessary to analyze the CsI data by performing a MC estimation of the distribution of the test statistic defined in eq. 4.1.
The dashed lines in fig. 2 show the p value distributions using Wilks' theorem compared to the results using the MC shown in solid lines. It is notable that Wilks' theorem is not even satisfied for one bin. Furthermore, there is no clear indication if Wilks' theorem leads to tighter constraints or looser constraints compared to the MC, so it is not possible and indeed wrong to claim the results using Wilks' theorem are most "conservative" or "aggressive." Instead it is required to perform the MC simulation to derive sensible constraints on any physical parameters.
While the validity of Wilks' theorem has been checked for the significance of the observation using the total number of events in the case of the most recent COHERENT Argon data [107,108] such checks are always important as the number of detected CEνNS events is expected to be small in many CEνNS experiments (in particular those using accelerators as neutrino source). Hence a consistent and correct analysis needs to rely on the use of a MC.
The constraints using two timing and one energy bin for only one non-zero NSI parameter at the time are shown in fig. 3. The constraints on V ee are tighter than the ones for V µµ . Furthermore, the best fit point for V µµ has a lower p value, similar to the p value of the SM (see tab. 1) indicating that the introduction of V µµ doesn't lead to an improved fit. This can be understood easily since V µµ , unlike V ee , affects both timing bins. The observed number of events in the prompt bin is larger than the one predicted by the SM whereas the number of observed events in the delayed bin is smaller than predicted one there is no value of V µµ in combination with the pull terms which can bring both bins into better agreement with observations. The V µµ curve also exhibits two maxima corresponding to the two degenerate points which lead to the same cross section as the SM. The V ee curve also has two maxima however since the best fit points are closer together the separation between the maxima is not visible.
The constraints on V eµ and V eτ are similar whereas V µτ is less constrained. While the allowed region for the flavor diagonal NSI parameters is asymmetric and the CsI data   prefers larger positive NSI parameters the constraint on flavor changing NSI parameters is symmetric around V αβ ≈ 0 as there is no flavor changing SM contribution to the cross section (see eq. (2.8)). The direction of the asymmetry of the flavor diagonal elements is easily understood as g V n ≈ − 1 2 .

Multiple NSI parameters
Next we consider the case that all 5 NSI parameters are non-zero simultaneously. From the right plot in fig. 2 we see that also in this case two timing and one energy bin lead to the best constraints 4 . We again see that Wilks' theorem is not fulfilled and the derivation of the results using a MC simulation is necessary.
In fig. 4 we present the results for the flavor diagonal ( V ee , V µµ ) and flavor off-diagonal NSI parameters ( V eµ , V eτ , V µτ ) marginalized over the remaining four NSI parameters using two timing and one energy bin, evaluated with a MC simulation. Tab. 2 shows a summary of the results for the 1σ, 90% C.L., 2σ and 3σ ranges of the NSI parameters derived in this section and in the previous. These constraints can be easily translated to constraints on NSI  Figure 4: Results for flavor (off-)diagonal NSI parameters, marginalized over the remaining NSI parameters using the data with the COHERENT CsI run using two timing and one energy bin in our MC analysis. Table 2: Allowed ranges of the NSI parameters derived using two timing and one energy bin with a MC simulation using the COHERENT CsI data. The constraints apply to mediator masses 100 MeV. There are two disjoint regions when allowing only for non-zero V µµ . These constraints, which are for u,V = d,V , can be translated into quark flavor specific NSI with V αβ = (2 + Y n ) u αβ + (1 + 2Y n ) d αβ with the neutron fraction Y n = N n /N p [19].
with the neutron fraction Y n = N n /N p = N n /N e of the material [19].
Comparing the values allowing only one NSI parameter at a time to other constraints on NSI from oscillations measurements (see [109] for a global analysis) we find that our COHERENT constraints on V eµ , V eτ improve over them while atmospheric oscillation data leads to stronger constraints on V µτ compared to COHERENT CsI.

LMA Dark solution
In the presence of NSI the Hamiltonian describing neutrino evolution exhibits a symmetry which leads to a solution for the θ 12 mixing angle in the second octant, the so-called LMA Dark solution [110][111][112][113]. When the term in eq. (2.7) is added to the SM Lagrangian the Hamiltonian which describes neutrino oscillations in the presence of matter is modified to with the mixing matrix in vacuum U vac and and N f (x) is the fermion density as a function of the distance the neutrinos have travelled in matter. As a consequence of the CPT theorem this Hamiltonian is invariant under H ν → −(H ν ) * , see [114,115] for a discussion in the context of NSI. This transformation corresponds to changing the vacuum mixing parameters and simultaneously the NSI parameters as In [7] an analysis using the total number of events of the COHERENT CsI observation has been shown to be sufficient to rule out the LMA Dark solution at around 2σ assuming only non-zero q ee and q µµ (for q = u or d) 5 . Here we will revisit the result using two timing and one energy bin in our statistical framework. Fig. 5 shows the allowed 1σ, 2σ, 3σ ranges from COHERENT CsI in the u µµ , u ee plane (all other NSI parameters including the d,V parameters are fixed to be zero in this subsection) compared to the 2σ regions for LMA and LMA Dark [109]. Only a small part of the LMA Dark region is contained in the COHERENT 3σ region. Therefore our results increase the exclusion of second octant for θ 12 in the presence of NSI compared to [7]. This result is crucial as it not only shows that the upper octant of θ 12 is disfavored, it ensures that measurements of the mass ordering from DUNE, JUNO, and T2HK [117][118][119] will be robust under the inclusion of NSI with heavy enough mediator [19,111].

The impact of more statistics
As we have demonstrated in the previous sections Wilks' theorem is not applicable for the current CsI data. However one might wonder if with more statistics the requirements for Wilks' theorem are fulfilled. Furthermore, it is instructive to see if for future data 144 bins still lead to worse constraints than less bins.  [109]. The green, blue, pink regions indicates the allowed range from COHERENT CsI at 1σ, 2σ, 3σ. We used two timing and one energy bin for our MC estimation of the significance.
To answer this questions we simulate 100 times more CsI data with the current experimental configuration, assuming the SM will be measured. For the background we conservatively assume a flat distribution of 500 events per bin in each of the 144 bins. We calculate the TS assuming only V ee to be non-zero. In fig. 6 we show the results using two timing and one energy bin, two timing and four energy bins and 144 bins. We also show the difference between the results using Wilks' theorem and the MC simulation. Even with more statistics the conclusions for the ideal analysis strategy derived from the current data are still valid. For 144 bins Wilks' theorem is not fulfilled and the best constraints can be derived when using two timing and one energy bin. These results further demonstrate that independent of the shape of the background data Wilks' theorem fails due to the low number of signal events and the low signal to background ratio.

Summary and Conclusion
The first observation of CEνNS in 2017 has sparked large interest to use this process to constrain Standard Model parameters as well as to test a broad range of new physics scenarios. However to obtain statistically meaningful constraints from CEνNS a correct analysis procedure needs to be developed. To this end in this manuscript we revisited the analysis of CEνNS and established an adequate analysis procedure to obtain viable results.
As concrete examples we show the difference between our statistical approach and commonly used incorrect approaches for the case of the SM prediction as well as for non standard interactions. We focused on the publicly available data from the COHERENT CsI observation of CEνNS in 2017 as an example, however our results apply to a wide class of CEνNS experiments. First, we identified two general subtleties of the CEνNS analysis which are shortcomings of previous works, namely the effect of smearing and the choice of number of bins. Since many CEνNS detectors measure the number of PE instead of nuclear recoil directly we first pointed out that smearing of the predicted events needs to be taken into account. This accounts for the probability that sometimes a nuclear recoil event yields a different number of PE than average. Ideally smearing of the number of events via a Poisson distribution which translates the number of events in raw PE bins to the number of events in true, detected PE bins is applied to the sum of signal and background events. However we found that on the publicly released CsI background data cuts have been already imposed as we could not find a good fit of an event distribution in terms of raw PE to the provided distribution in true PE. Furthermore, unlike in other analyses, we use the sum of all three background data files to increase the statistics of the background.
We then investigated the question of ideal number of bins. As background fluctuations can mimic the signal choosing large bins to increase the number of events per bin and hence decrease the Poisson fluctuations per bin is desirable however if new physics predicts a certain timing or energy behavior too large bins can decrease the sensitivity of the analysis. Therefore, the ideal number (and width) of bins depend on the model one wants to test. As we focused in this manuscript on the case of flavor specific NSI with a heavy mediator (with mass above 100 MeV) which does not lead to an easily measurable change in the energy spectrum of the signal using two timing bins which included the prompt and delayed signal without including additional energy information led to the best constraints independent if all NSI parameters are allowed to be non-zero simultaneously or just one NSI parameter at a time. We compared several bin configurations and showed that the commonly used configuration of 12 timing times 12 energy bins leads to the worst constraints on NSI parameters, confirming that the correct choice of bins has substantial impact on the allowed regions for new physics parameters. On the other hand, we find that for all bin configurations the SM is a good fit to the data.
Additionally, we emphasized for the first time in the literature that Wilks' theorem is not fulfilled for the COHERENT CEνNS data due to the small number of signal events. This statement is even true for a increased exposure by a factor of 100. Hence a MC estimation of the p value is necessary. Using this improved statistical framework we presented the results for flavor (off-)diagonal NSI parameter allowing only one of them to be non-zero at the time or consider all of them to be non-zero simultaneously. Additionally we revisited the constraints on the LMA Dark solution, a degeneracy in the Hamiltonian for neutrino oscillations in the presence of NSI which is now disfavored at close to 3σ. This result is of crucial importance for the robustness of current and upcoming oscillation experiments probing the mass ordering question, and thus also neutrinoless double beta decay and cosmological constraints on neutrino properties. In all cases our results provide the strongest and most statistically sound constraints in the literature emphasizing the necessity of the improved analysis framework.
Our results show that using the correct statistical approach as well as accounting for experimental subtlety like smearing, the use of all available background data and the choice of binning has an important impact on the results, both for the SM but also new physics models.
Our proposed analysis approach is accompanied by the first publicly released code to calculate and analyse CEνNS. The code allows for an easy use of our analysis as smearing, the background template and the calculation of the p value are already implemented and can be extended in a straight forward way to probe other new physics scenarios. Furthermore, we provide our numerical results in the form of datafiles. This pull term is derived from Poisson based assumptions. We demonstrate in fig. 7 the impact of the different parametrization of the pull terms on the constraint on V ee marginalizing over the remaining NSI parameters for different numbers of bins.
As expected with the new pull term parametrization the flattening of the constraints for large NSI parameters is avoided, leading to physically sensible results. The impact of the new pull term parametrization is most prominent for more bins and large NSI parameters while for small NSI parameters and two bins there is no difference in the results between the constraints using Gaussian pull terms or the new pull terms. We should stress that while the new pull term results in more "sensible" results, there is no guarantee that either is correct. In practice, one should obtain the probability density function for the constraint used in a given systematic and to be cautious when a pull is going outside the region provided.
As we again find that using two timing and one energy bin lead to the best constraints and there is no difference between the two different pull term parametrizations we derive the constraints in the main text using standard Gaussian pull terms.