1 Introduction

A good understanding of single pion production by neutrinos with few-GeV energies is important for current and future oscillation experiments, where pion production is either a signal process, or a large background for analyses which select quasi-elastic events. At these energies the dominant production mechanism is via the production and subsequent decay of hadronic resonances.

Complete models of neutrino–nucleus single pion production interactions are usually factorized into three parts: the neutrino–nucleon cross section; additional nuclear effects which affect the initial interaction; and the “final state interactions” (FSI) of hadrons exiting the nucleus. Experimental data on nuclear targets presents a confusing picture, with recent data from the MINER\(\nu \)A  [1, 2] and MiniBooNE [3] experiments in poor agreement with each other in the framework of current theoretical models [4, 5]. An additional problem is the disagreement between measurements of the neutrino–nucleon single pion production cross section in the 100 MeV to few-GeV energy range most relevant for current and planned neutrino oscillation experiments. The axial form factor for pion production on free nucleons cannot be constrained by electron scattering data, so it relies upon data from Argonne National Laboratory’s 12 ft bubble chamber (ANL) and Brookhaven National Laboratory’s 7 ft bubble chamber (BNL). However, these datasets differ in normalization by 30–40 % for the leading pion production process \(\nu _{\mu }p\rightarrow \mu ^{-}p\pi ^{+}\), which leads to large uncertainties in the predictions for oscillation experiments [611], as well as in the interpretation of data taken on nuclear targets [12].

It has long been suspected that the discrepancy between ANL and BNL was due to an issue with the normalization of the flux prediction from one or both experiments,Footnote 1 and it has been shown by other authors that their published results are consistent within the experimental uncertainties provided [6, 16]. In Ref. [17], we presented a method for removing flux normalization uncertainties from the ANL and BNL \(\nu _{\mu }p\rightarrow \mu ^{-}p\pi ^{+}\) measurements by taking ratios with charged-current quasi-elastic (CCQE) event rates in which the normalization cancels. Then we obtained a measurement of \(\nu _{\mu }p\rightarrow \mu ^{-}p\pi ^{+}\) by multiplying the ratio by an independent measurement of CCQE (which is well known for nucleon targets). Using this technique, we found good agreement between the ANL and BNL \(\nu _{\mu }p\rightarrow \mu ^{-}p\pi ^{+}\) datasets. In this work, we extend that method to include the subdominant \(\nu _{\mu }n\rightarrow \mu ^{-}p\pi ^{0}\) and \(\nu _{\mu }n\rightarrow \mu ^{-}n\pi ^{+}\) channels, and we use the resulting data, along with the \(Q^{2}\)-spectra (where \(Q^{2}\) is the four-momentum transfer) from the same experiments, to constrain the parameters of the GENIE single pion production model [18]. While more sophisticated single pion production models exist [1921], the GENIE generator is widely used by current and planned neutrino oscillation experiments, so tuning the generator parameters represents a pragmatic approach to improving its description of available data. We find that the reanalyzed data, where the normalization discrepancy has been resolved, is able to significantly reduce the uncertainties on the pion production parameters. We also find that the non-resonant background prediction from GENIE needs to be significantly reduced to fit the data.

Reduced uncertainties on pion production parameters are vital for current and future neutrino oscillation experiments, which have very stringent systematic uncertainty requirements [22, 23]. We recommend that our new uncertainties should be used by experiments which use the GENIE neutrino interaction generator, and the reanalyzed ANL and BNL datasets presented here and in Ref. [17] should be used instead of the published ANL and BNL datasets for future model comparisons.

The paper is organized as follows. In Sect. 2, we describe the datasets used in this analysis. In Sect. 3 we describe the GENIE single pion production model and compare the nominal GENIE model and error bands with the data. The \(\chi ^{2}\) statistic which is minimized and the fit machinery are discussed in Sect. 4.1; the fit results are presented in Sect. 4.2; and there is a discussion of the goodness of fit in Sect. 4.3. Finally, our conclusions are presented in Sect. 5.

2 Datasets used in this analysis

In this work, we use the \(Q^{2}\) and \(E_{\nu }\)-spectra from ANL and BNL for all three charged-current single pion production modes (\(\nu _{\mu }p\rightarrow \mu ^{-}p\pi ^{+}\), \(\nu _{\mu }n\rightarrow \mu ^{-}p\pi ^{0}\) and \(\nu _{\mu }n\rightarrow \mu ^{-}n\pi ^{+}\)), giving a total of 12 datasets.

The \(Q^{2}\)-dependent distributions used are presented as flux-integrated event rates without any invariant mass cut applied, and were digitized from Refs. [24] (ANL) and [25] (BNL) for this work. To produce flux-integrated event rate predictions with GENIE, the flux was taken from Refs. [13] (ANL) and [15] (BNL).

The \(E_{\nu }\)-dependent distributions for both ANL and BNL are taken from the reanalysis of \(\nu _{\mu }p\rightarrow \mu ^{-}p\pi ^{+}\) data presented in Ref. [17] and the reanalysis of \(\nu _{\mu }n\rightarrow \mu ^{-}p\pi ^{0}\) and \(\nu _{\mu }n\rightarrow \mu ^{-}n\pi ^{+}\) using the same technique and presented in Appendix A. These datasets are neutrino–deuterium cross sections, as no correction has been applied to account for deuterium nuclear effects. Additionally, in Appendix B we present reanalyzed results for the three pion production channels in which an additional correction has been applied to include the effect of the invariant mass cut \(W \le 1.4\) GeV on the \(E_{\nu }\)-dependent distributions. The reanalyzed results with \(W \le 1.4\) GeV were not used in the present work, but we provide it for use in future model comparisons.

The \(E_{\nu }\)-dependent distributions for both ANL and BNL are shown for all three single pion production channels in Fig. 1 along with other bubble chamber measurements available for these channels. The original ANL and BNL results are also shown so the effect of the reanalysis can be seen. It is clear that the reanalysis affects all channels, although the effect is more pronounced for the dominant \(\nu _{\mu }p\rightarrow \mu ^{-}p\pi ^{+}\) channel, where the statistical errors are smaller and biases are easier to see. The ANL and BNL datasets agree well in all three channels after the reanalysis. In Fig. 1a, b, the BEBC data on a hydrogen target has an invariant mass cut of \(W \le 2\) GeV [26], which removes contributions from diffractive processes. The FNAL data on a hydrogen target is selected with an invariant mass cut of \(W \le 1.4\) GeV, in order to isolate the \(\Delta ^{++}\) contribution to the cross section, which also cuts out any diffractive contributions from the cross section [27]. Additionally, the FNAL result was scaled by 14 % to account for \(\Delta ^{++}\) contributions with \(W > 1.4\) GeV.

Despite the caveats associated with the subdominant \(\nu _{\mu }n\rightarrow \mu ^{-}p\pi ^{0}\) and \(\nu _{\mu }n\rightarrow \mu ^{-}n\pi ^{+}\) channels, we recommend that all three channels are used for future comparisons with ANL and BNL data. It should be stressed that most of the deficiencies in the reanalyzed results detailed here are also present in the original ANL and BNL results, and should be borne in mind when using reanalyzed or published ANL and BNL results.

Fig. 1
figure 1

The published and extracted ANL and BNL data are compared with other measurements of the three pion production channels [2629]. All data is on hydrogen and deuterium targets, except for SKAT 1989 data which was taken on heavy freon (CF\(_{3}\)Br). Note that both published and reanalyzed ANL and BNL data shown here have no invariant mass cut in the event selection, whereas the other datasets have an invariant mass cut of \(W \le 2 \mathrm{GeV}\) applied unless otherwise mentioned

A final note of caution regarding the use of these corrected datasets is that there is a hidden correlation between the three channels for each experiment. As the CCQE events used in the correction are common to all channels, statistical errors are correlated in a way which is difficult to quantify. An example of the problem can be seen by looking at the three ANL channels for \(1.0 \le E_{\nu } \le 1.1\) GeV where an upward fluctuation of the CCQE event rate leads to a decrease in the reanalyzed cross section. It should be noted that this problem is also present in the published ANL cross section results because they used the measured CCQE event rate to correct their flux prediction. This problem is not dealt with in the fits presented in this work, we simply use the statistical errors from the reanalysis without considering correlations, but it will lead to an increase in the \(\chi ^{2}\) of the fits (because the \(\chi ^{2}\) penalty for a large statistical fluctuation is applied three times rather than once).

3 GENIE single pion production model

Table 1 Variable parameters in the GENIE single pion production model [45]. The normalization of the axial form factor is not a variable parameter in GENIE currently, but it is varied in this work as described in the text

The single pion production model in GENIE is described in Ref. [18] and does not change significantly between the GENIE major versions 2.6.X and 2.8.X (X denotes the minor version number) investigated in this study.Footnote 2 All processes simulated in GENIE use the Bodek–Ritchie RFG model to describe the initial state nucleon momentum distribution [30] for all nuclear targets, including deuterium. In GENIE, single pion production is separated into resonant and non-resonant terms, with interference terms between the two neglected. The resonant component (RES) is a modified version of the Rein–Sehgal (R-S) model [31]. In the original R-S model, the production and subsequent decay of 18 nucleon resonances with invariant masses \(W \le 2\) GeV are considered. In GENIE, only 16 resonances are included, based on the recommendation of the Particle Data Group [32]. The cross section calculation has not been modified to include lepton mass terms, but the effect of lepton mass terms on the phase space boundaries is taken into account. The cross section is cut off at a tunable invariant mass value, which is \(W \le 1.7\) GeV by default (and in this study). No in-medium modifications to resonances are considered, and interferences between resonances are neglected in the calculation. By default, the resonances decay isotropically in their center of mass frame. In general, there is an additional contribution to single pion production from coherent pion production processes, which are modeled (by default) in GENIE using the Rein–Sehgal coherent model [33] with lepton mass corrections included [34]. However, for the ANL and BNL channels considered here, the selection criteria include requirements on the struck or spectator nucleon, which effectively excludes coherent pion production as the deuterium is no longer bound in the final state. As such, coherent contributions to single pion production are not considered further in this work.Footnote 3

The original Rein–Sehgal model in Ref. [31] includes non-resonant single pion production as an additional resonance amplitude, while in GENIE the non-resonant component is implemented as an extension of the deep inelastic scattering model. The non-resonant (DIS) contribution to the GENIE single pion production model is calculated using the Bodek–Yang parametrization [36], with other relevant parameters described in detail in Ref. [18]. Hadronization is described by the AKGY model [37], which uses KNO scaling [38] for invariant masses of \(W \le 2.3\) GeV, and PYTHIA [39] for invariant masses of \(W \le 3.0\) GeV, with a smooth transition in between. The low-W KNO model is tuned to data from the Fermilab 15-foot bubble chamber experiment [40], and the high-W PYTHIA model is tuned to BEBC data [41]. We note that retuning of the PYTHIA model has been discussed elsewhere [42], although is not considered here as it affects larger W values than are relevant for this study.

A major difference between the GENIE versions 2.6.X and 2.8.X is the change to the default Final State Interaction (FSI) model, which is applied to all outgoing particles produced at the vertex for both the resonant and non-resonant contributions to single pion production. The default FSI model for both versions of GENIE is the hA intranuclear cascade model [18], which is tuned to \(\pi ^{+}\)\(^{56}Fe\) and p\(^{56}Fe\) data, then extrapolated to other targets based on \(A^{2/3}\) scaling (where A is the atomic number). In GENIE v2.6.X, FSI effects are negligible for deuterium, but the hA model was retuned for GENIE v2.8.X [43], which leads the deuterium FSI to reduce the total cross section predictions for all channels relevant for this work by 20–30 %. In this work we have ignored this difference and make the assumption that interactions on deuterium can be treated as interactions on quasi-free nucleons which are only loosely bound together, and so neglect FSI effects. Low-\(Q^{2}\) bins (\(Q^{2}\) < 0.1 GeV\(^{2}\)) are not included in the fit to avoid the region where FSI effects are expected to have a significant effect in deuterium. We note that in Ref. [44] a careful study of FSI effects for pion production interactions on deuterium was carried out. This work found that interactions between the final state nucleons significantly modifies the cross section for the \(\nu _{\mu }p\rightarrow \mu ^{-}p\pi ^{+}\) and \(\nu _{\mu }n\rightarrow \mu ^{-}n\pi ^{+}\) channels, the most notable feature being a suppression of the cross section for very forward pions. A more careful treatment of FSI based on the work presented in Ref. [44] would be an improvement to future iterations of this work, but the calculation does not currently predict the entire final state of the interaction so is not ready to be implemented in a generator.

In GENIE, there are a number of systematic parameters which can be varied to change the single pion production model [45]. These parameters are summarized in Table 1. Note that although GENIE allows the normalization of charged-current non-resonant single pion production on protons and neutrons separately, we have grouped them here into a single category (labeled “DIS”) in the absence of any reason to treat them differently.Footnote 4

The axial form factor used for resonant pion production in GENIE is given by

$$\begin{aligned} F_{\mathrm {A}}(Q^{2}) = \frac{F_{\mathsf {A}}(0)}{\left( 1 + \frac{Q^{2}}{\left( M_{\text {A}}^{\text {RES}} \right) ^{2}}\right) ^{2}}, \end{aligned}$$
(1)

where \(F_{\mathsf {A}}(0) \equiv \frac{5}{3}Z \equiv g_{\mathrm {A}} = 1.267\), Z is a renomalization factor for the axial-vector coupling constant of a quark obtained from data in Ref. [46] (also considered in Ref. [47]) and \(M_{\text {A}}^{\text {RES}} \) is the resonant axial mass, available as a parameter in GENIE’s reweighting framework. The normalization of the axial form factor is not available in the GENIE reweighting framework (changing its value requires events to be regenerated with a modified value of the “RS-Zeta” parameter), but a similar parameter for the overall resonant normalization is available. Modifying \(F_{\mathsf {A}}(0)\) is the more physically motivated alternative, but as modifying the resonant normalization is more convenient for users, we perform two fits, each with one of these parameters modified.

The nominal GENIE v2.8.2 cross sections for the three single pion production channels considered in this work (\(\nu _{\mu }p\rightarrow \mu ^{-}p\pi ^{+}\), \(\nu _{\mu }n\rightarrow \mu ^{-}p\pi ^{0}\) and \(\nu _{\mu }n\rightarrow \mu ^{-}n\pi ^{+}\)) are shown as a function of the neutrino energy \(E_{\nu }\) in Fig. 2, and compared with ANL and BNL data. The nominal GENIE v2.8.2 event rate predictions as a function of \(Q^{2}\), produced using the ANL and BNL fluxes, are compared separately to ANL and BNL data in Fig. 3. The GENIE prediction for the \(Q^{2}\) distributions is normalized separately for ANL and BNL such that the total prediction is equal to the measured rate summed over all three channels. The \(Q^{2}\) predictions shown are therefore shape-only, but with the relative normalization between the different pion production channels preserved.Footnote 5 All data considered have no invariant mass cuts applied and the event selection in GENIE is based on the particles produced at the initial interaction vertex, not those surviving GENIE’s FSI model as previously discussed. In Figs. 2 and 3, the GENIE prediction is also shown broken down into resonant (RES) and non-resonant (DIS) contributions. Additionally, the dominant \(\Delta \) contribution to the RES component is shown separately for reference. The total GENIE prediction is the incoherent sum of the RES and DIS contributions, where interference terms have been neglected. On each plot the 1\(\sigma \) error band produced by combining the nominal GENIE uncertainties on \(M_{\text {A}}^{\text {RES}}\), RES normalization, and DIS normalization is also shown for comparison.

Fig. 2
figure 2

The nominal GENIE prediction is shown as a function of \(E_{\nu }\) for the three single pion production channels of interest, and is compared to the corrected ANL and BNL data. The total prediction is broken down into the resonant (RES) and non-resonant (DIS) contributions, and additionally, the \(\Delta \)-contribution to the RES component is shown. The error band shown on the total GENIE prediction shows the 1\(\sigma \) error bands for all default GENIE parameters given in Table 1 combined in quadrature (note that \(F_{\mathsf {A}}(0)\) is not a default GENIE parameter so is not included in the error band)

Fig. 3
figure 3

The nominal GENIE prediction is shown as a function of \(Q^{2}\) for the three single pion production channels of interest, and is compared separately to the ANL and BNL data. The total prediction is broken down into the resonant (RES) and non-resonant (DIS) contributions, and additionally, the \(\Delta \)-contribution to the RES component is shown. The error band shown on the total GENIE prediction shows the 1\(\sigma \) error bands for all GENIE default parameters given in Table 1 combined in quadrature (note that \(F_{\mathsf {A}}(0)\) is not a default GENIE parameter, so it is not included in the error band)

It is clear from Figs. 2 and 3 that the nominal GENIE prediction cannot describe all of the pion production channels well for the reanalyzed datasets. In Fig. 2, it is noticeable that, while the measured cross sections for the subdominant \(\nu _{\mu }n\rightarrow \mu ^{-}p\pi ^{0}\) and \(\nu _{\mu }n\rightarrow \mu ^{-}n\pi ^{+}\) channels are similar, there are large differences between the nominal GENIE predictions for these channels. The non-resonant component of the GENIE prediction, which contributes strongly to these channels, appears to be too large. It can be seen from Fig. 3 that the nominal GENIE prediction fails to describe the low-\(Q^{2}\) data well for some channels. We also note that the GENIE uncertainties are larger than the data suggests, and they may be reduced by tuning the GENIE model to the ANL and BNL data. These observations motivate this work.

4 Fitting the GENIE model

In this section, the datasets described in Sect. 2 are used to constrain the GENIE model introduced in Sect. 3. The \(\chi ^{2}\) statistic which is minimized is given in Sect. 4.1, and results are given in Sect. 4.2. A discussion of the goodness of fit is given in Sect. 4.3. The MINUIT package [48] as implemented in the ROOT library [49] is used to perform all fits.

The fits are performed separately for four GENIE configurations. Both GENIE v2.6.2 and GENIE v2.8.2 predictions are fit using all parameters available in the standard version of GENIE (as described in Table 1). A fit was performed to GENIE v2.8.2 where the normalization of the resonant axial form factor, \(F_{\mathsf {A}}(0)\), is used as a fit parameter instead of the resonant normalization, as motivated in Sect. 3. Finally, a fit was performed to GENIE v2.8.2 without the resonant normalization or \(F_{\mathsf {A}}(0)\) to investigate the effect that correlations between the axial mass (which has a strong effect on the normalization of the cross section) and the normalization parameters have on the results.

4.1 \(\varvec{\chi }^{2}\) definition

No information as regards systematic uncertainties, correlations within datasets or correlations between datasets is available, so only statistical errors are considered for all datasets, and the function to be minimized can be expressed as a sum over the datasets included in the fit. This is reasonable as the statistical uncertainties are large. Additionally, the datasets used are efficiency corrected by the experiments, but are not unfolded, so the treatment of detector effects is likely to be inadequate.Footnote 6 For these reasons, any measure of goodness of fit should be treated as approximate.

A Poisson-likelihood statistic is used for the datasets as a function of \(Q^{2}\) because many of the higher \(Q^{2}\) bins have low event rates. Note that for \(Q^{2}\) datasets, the sum is over the N bins with \(Q^{2}\) \(\ge \) 0.1 GeV\(^{2}\).

$$\begin{aligned} \chi ^{2}= & {} \sum _{Q^{2} \;\mathrm {datasets}} \left\{ 2\sum ^{N}_{i=1}\left[ \mu _{i}(\mathbf {\mathbf {x}}) - n_{i} + n_{i}\ln \frac{n_{i}}{\mu _{i}(\mathbf {\mathbf {x}})}\right] \right\} \nonumber \\&+\,\sum _{E_{\nu }\;\mathrm {datasets}}\left\{ \sum ^{N}_{i=1}\frac{\left[ n_{i}-\mu _{i} (\mathbf {\mathbf {x}})\right] ^{2}}{\sigma _{i}^{2}}\right\} \end{aligned}$$
(2)

where \(n_{i}\) and \(\mu _{i}(\mathbf {\mathbf {x}})\) are the measured and predicted number of events in the ith bin, \(\sigma _{i}\) is the statistical error on the ith bin, \(\mathbf {\mathbf {x}}\) are the model parameters varied in the fit and the inner summations are over the N bins of each dataset. \(\mathbf {\mathbf {x}}\) also contains normalization terms for ANL and BNL which affect the \(Q^{2}\) datasets only. As previously remarked, the \(Q^{2}\) datasets are shape-only in the fit, but the relative normalization between the three pion production modes is preserved separately for ANL and BNL.

Note that this statistic is appropriate for minimization, but \(\chi ^{2}/N_{\mathsf {DOF}} \) is not strictly correct as measure of the goodness of fit because the Poisson-likelihood terms contribute constant terms to the \(\chi ^{2}\). A more rigorous measure of the goodness of fit is discussed in Sect. 4.3.

Table 2 Best fit results for the four GENIE configurations used in this analysis
Fig. 4
figure 4

Correlation matrices from the GENIE v2.8.2 (RES) and GENIE v2.8.2 (\(F_{\mathsf {A}}(0)\)) fits. The GENIE v2.6.2 correlation matrix is very similar to the GENIE v2.8.2 (RES) matrix, and the GENIE v2.8.2 (no norm.) matrix is very similar to the relevant bins of both of the matrices shown

Fig. 5
figure 5

Best fit distributions and post-fit uncertainties for the four \(\nu _{\mu }p\rightarrow \mu ^{-}p\pi ^{+}\) datasets included in the GENIE v2.8.2 (RES) fit. The nominal prediction is shown for reference, and the \(\chi ^{2}\) contribution from each dataset is given in the legend for both the nominal and the best fit distributions. The nominal and best fit DIS contribution to the total GENIE prediction (RES+DIS) are also shown for reference

Fig. 6
figure 6

Best fit results and post-fit uncertainties for the four \(\nu _{\mu }n\rightarrow \mu ^{-}p\pi ^{0}\) datasets included in the GENIE v2.8.2 (RES) fit. The nominal prediction is shown for reference, and the \(\chi ^{2}\) contribution from each dataset is given in the legend for both the nominal and the best fit distributions. The nominal and best fit DIS contribution to the total GENIE prediction (RES+DIS) are also shown for reference

Fig. 7
figure 7

Best fit results and post-fit uncertainties for the four \(\nu _{\mu }n\rightarrow \mu ^{-}n\pi ^{+}\) datasets included in the GENIE v2.8.2 (RES) fit. The nominal prediction is shown for reference, and the \(\chi ^{2}\) contribution from each dataset is given in the legend for both the nominal and the best fit distributions. The nominal and best fit DIS contribution to the total GENIE prediction (RES+DIS) are also shown for reference

4.2 Results

Two fake data studies were performed to validate the fitter. First of all, Asimov [50] fake data fits were produced for all four GENIE configurations considered. These provide basic validation that the fitter found the correct minimum and give the expected size of the parameter uncertainties, which can be used to validate the fit results. Second, pull studies were performed for all GENIE configurations to check that the test statistic is an unbiased estimator of central values and uncertainties for the parameters varied in the fits. No biases were observed.

The best fit results to all 12 datasets are shown in Table 2 for the four GENIE configurations considered in this work. The parameter uncertainties given by the fits are consistent with those predicted by the Asimov fake data study. The normalization of the non-resonant background was reduced significantly in all fits, as was expected given the nominal model comparisons shown in Sect. 3. The resonant axial mass, \(M_{\text {A}}^{\text {RES}}\), was also reduced from the GENIE nominal value of \(M_{\text {A}}^{\text {RES}}\) = 1.12 GeV in all fits, although we note that there is a strong anticorrelation between \(M_{\text {A}}^{\text {RES}}\) and the RES normalization, and between \(M_{\text {A}}^{\text {RES}}\) and \(F_{\mathsf {A}}(0)\) (as can be seen in Fig. 4). The GENIE v2.8.2 fits were also repeated using GENIE’s free nucleon cross sections in order to ensure that the results are not biased by GENIE’s initial state nuclear model. It was found that the results were consistent to within 1\(\sigma \) for all parameters in all fits.

Fig. 8
figure 8

Expected \(\chi ^{2}\) distribution for the test statistic defined in Eq. 2 produced using 100,000 toy experiments. The nominal and best fit \(\chi ^{2}\) values for 12 datasets fit in Sect. 4.2 are shown for comparison. The nominal \(\chi ^{2}\) for GENIE v2.8.2 is shown, as well as the best fit \(\chi ^{2}_\mathrm{min}\) from the GENIE v2.8.2 (RES) fit

Table 3 Contributions to the nominal \(\chi ^{2}\) for GENIE v2.8.2 and to the best fit \(\chi ^2_\mathrm{min}\) for the GENIE v2.8.2 (RES) fit from each of the 12 datasets included in the fit.

The GENIE v2.6.2 and GENIE v2.8.2 (RES) results, with the same parameters available in the standard version of GENIE (as described in Table 1) give consistent results. The fit which uses the normalization of the axial form factor, \(F_{\mathsf {A}}(0)\), as a fit parameter is consistent with the other fits, although as \(F_{\mathsf {A}}(0)\) has a small \(Q^{2}\) dependence, the correlation with \(M_{\text {A}}^{\text {RES}}\) is different and the value for \(M_{\text {A}}^{\text {RES}}\) is less suppressed than in fits with RES normalization free (which has no \(Q^{2}\) dependence).

The best fit distributions for the GENIE v2.8.2 (RES) fit are shown for the \(\nu _{\mu }p\rightarrow \mu ^{-}p\pi ^{+}\) distributions in Fig. 5, for the \(\nu _{\mu }n\rightarrow \mu ^{-}p\pi ^{0}\) distributions in Fig. 6, and for the \(\nu _{\mu }n\rightarrow \mu ^{-}n\pi ^{+}\) distributions in Fig. 7. Data points with \(Q^{2}\) \(\le 0.1\) GeV\(^{2}\) are shown but not included in the \(\chi ^{2}\) calculation.

4.3 Goodness of fit

As has been previously remarked, the \(\chi ^{2}\)/\(N_{\mathsf {DOF}}\) is not an appropriate measure of the goodness of fit for Eq. 2 as it involves Poisson-likelihood terms with low-statistics bins. In Fig. 8, the expected \(\chi ^2\) distribution has been produced by making 100,000 toy experiments in which a fake dataset has been produced with statistical errors thrown for all 12 datasets included in the fit. The \(\chi ^2\) is calculated between each toy experiment and the nominal data (without thrown statistical errors). The p-value of any fit can be calculated by integrating the distribution to the right of any given \(\chi ^{2}_\mathrm{min}\) fit value as the \(\chi ^{2}\) distribution is independent of the fit type. Figure 8 shows the sampling distribution from this method, with the actual fit value for GENIE v2.8.2 (\(F_{\mathsf {A}}(0)\)), which has a p-value \(\ll 10^{-4}\), indicating a poor fit between data and the model, even after the fit procedure.

In Table 3, the contribution that each of the 12 datasets makes to the nominal and best fit \(\chi ^{2}\) values is given for the GENIE v2.8.2 (RES) fit. The \(N_{\mathsf {DOF}}\) contributed by each dataset is also shown for comparison. It is clear that there is a disproportionate contribution from the reanalyzed \(E_{\nu }\)-dependent datasets for the subdominant channels (\(\nu _{\mu }n\rightarrow \mu ^{-}p\pi ^{0}\) and \(\nu _{\mu }n\rightarrow \mu ^{-}n\pi ^{+}\)). The uncertainty on these distributions only includes statistical errors, which are dominant, but there are significant normalization uncertainties due to detector corrections and background subtractions which are not included. These corrections are also likely to have an effect on the shape of the distributions, but it is not possible to calculate meaningful shape-uncertainties for these effects (nor are they included in the published results from ANL or BNL). It should also be noted that there is a correlation between the three \(E_{\nu }\)-dependent datasets for both ANL and separately for BNL, introduced by the procedure for reanalyzing the datasets, although this is unlikely to be a significant issue. These issues are discussed further in Appendix A.

To ensure that including the four subdominant \(E_{\nu }\)-dependent datasets (\(\nu _{\mu }n\rightarrow \mu ^{-}p\pi ^{0}\) and \(\nu _{\mu }n\rightarrow \mu ^{-}n\pi ^{+}\) for both ANL and BNL) does not badly bias the results, the fit was repeated without these four problematic datasets included. The fit results are within 1\(\sigma \) of the values in Table 2, which indicates that these datasets do not bias the fit strongly. Indeed, the \(Q^{2}\)-dependent and \(E_{\nu }\)-dependent distributions for each channel and experiment agree reasonably well with each other. When these four datasets are excluded, the p-value returned at the best fit point is more reasonable (\(\sim \)0.02), indicating that the poor quality of fit seen in Fig. 8 can be mostly attributed to these four datasets. As there is no reason to suspect that these datasets are biasing the fit, we prefer to leave these datasets in the fit and present the fit with all 12 datasets included as the main result of this work. Using both \(Q^{2}\)-dependent and \(E_{\nu }\)-dependent datasets helps to break the degeneracy between \(M_{\text {A}}^{\text {RES}}\) and normalization parameters, so there is a strong reason for including all datasets if possible.

Fig. 9
figure 9

The global dataset (described in Sect. 2) is compared with the best fit result and post-fit uncertainties for the GENIE v2.8.2 (RES) fit, for the three single pion production channels investigated in this work. Note that the reanalyzed ANL and BNL data shown have no invariant mass cut in the event selection, whereas the other datasets have an invariant mass cut of \(W \le 2\,\mathrm{GeV}\) applied unless otherwise mentioned. An invariant mass cut of \(W \le 2\,\mathrm{GeV}\) has been applied to the GENIE prediction for this comparison

5 Conclusions

ANL and BNL provide the only neutrino–nucleon data for the energies in the few-GeV region most relevant for current and future oscillation experiments. The large normalization discrepancy between them has led to large uncertainties in pion production parameters, which presents a problem for meeting the stringent error budgets required by current and future oscillation analyses. In this work, we use the reanalyzed ANL and BNL \(\nu _{\mu }p\rightarrow \mu ^{-}p\pi ^{+}\) datasets from Ref. [17], where this normalization discrepancy has been solved, to constrain the GENIE single pion production model parameters. The reanalysis method from Ref. [17] is applied to the subdominant pion production channels \(\nu _{\mu }n\rightarrow \mu ^{-}p\pi ^{0}\) and \(\nu _{\mu }n\rightarrow \mu ^{-}n\pi ^{+}\), and \(Q^{2}\)-dependent distributions for all three channels for both ANL and BNL are also used in the fits. Although the GENIE single pion model is not state of the art, it is widely used by many currently running experiments, so improvements to the parametrization are of interest to the community. Additionally, the fits described here provide a blueprint for their use in constraining other models.

We find that the uncertainty on the variable model parameters can be significantly reduced with respect to the nominal GENIE parameter uncertainties [45], which were necessarily large to cover the disagreement between the published ANL and BNL datasets. A similar conclusion was found in the context of a different model in Ref. [51]. The retuned uncertainties on these parameters should be used by neutrino oscillation and interaction experiments. To obtain good agreement with the data it was necessary to significantly reduce the non-resonant background normalization from the GENIE nominal prediction. The result of the GENIE v2.8.2 (RES) fit is compared to the global \(E_{\nu }\)-dependent data for the three single pion production channels of interest in Fig. 9. Most of the higher energy datasets shown (described in Sect. 2) have an invariant mass cut of \(W \le 2\,\mathrm{GeV}\) applied, so the same invariant mass cut has been applied to the GENIE prediction shown. Note that the reanalyzed ANL and BNL data have no invariant mass cut applied, which should be borne in mind when interpreting Fig. 9.

We note that the recent coherent pion cross section results from MINER\(\nu \)A  [52] found a discrepancy between data and GENIE in a single pion production-dominated background sample that required significant reductions in the prediction, which may be alleviated by a reduction in the non-resonant single pion contribution as found in the fits presented here. We also note that the recent NO\(\nu \)A results [53] found a discrepancy between the hadronic energy distribution observed at the near detector and their GENIE simulation. There were more events in data where the hadronic system had less recoil, and fewer with high recoil, compared to the GENIE prediction. Although the discrepancy is treated as calibration effect in the NO\(\nu \)A analysis, it is more likely to be due to deficiencies in the GENIE cross section model. The retuned pion set of production parameters described in this work will ameliorate the NO\(\nu \)A discrepancy because it reduces the non-resonant pion production component, which will contribute events where the recoiling hadronic system has a lot of energy.

In this work, all available neutrino–nucleon single pion production data for neutrino energies below 10 GeV has been used to constrain the pion production parameters, including the reanalyzed ANL and BNL \(E_{\nu }\) data for the first time, which is a significant step forward toward reducing the cross section uncertainties on this channel to the level required for future neutrino oscillation experiments. Recent proposals to extract neutrino–proton pion production cross sections from experiments where the target material contains hydrogen [54] raise the possibility of new data, which will further reduce the parameter uncertainties.