1 Introduction

The Tokai to Kamioka (T2K) experiment produces a beam of predominantly muon neutrinos by impinging protons from an accelerator onto a target, using magnetic horns to direct the outgoing collision products which thereafter decay into the neutrinos that form the beam. A suite of near detectors, 280 m downstream of the production target, characterise the neutrinos before long-baseline oscillations take effect, and a far detector, \(295~\text {km}\) away, measures the long-baseline oscillations. This paper first introduces the neutrino oscillation formalism in Sect. 1 and summarises the T2K experiment in Sect. 2. Section 3 outlines the updates to the previous analysis [1, 2], with the systematic uncertainties presented in detail in Sect. 4 for the neutrino flux, and in Sect. 5 for the neutrino interaction model. The analysis of near-detector data, which constrains the majority of the systematic uncertainties in the oscillation analysis, is described in Sect. 6. The far-detector selections are described in Sect. 7, and the new constraints on the oscillation parameters are presented in Sect. 8. Section 9 summarises the simulated data studies, which act to increase the uncertainty on the oscillation parameters by studying the impact of alternative interaction models. The results are summarised in Sect. 10, and the data release, amongst other supplementary material, is provided in the appendices.

The observation of neutrino survival probabilities changing as a function of both flavour and distance travelled was established in the late 1990s by Super-Kamiokande (SK) [3]. Their measurements of neutrinos produced by cosmic rays in the atmosphere found that muon neutrinos disappeared after travelling through the Earth, whereas electron neutrinos did not. A few years later, the Sudbury Neutrino Observatory (SNO) found evidence that neutrino flavour change was responsible for the measured deficit of electron neutrinos compared to what was predicted from the Sun [4]. Neutrino flavour changing was also confirmed using artificial sources of neutrinos in the long-baseline reactor experiment KamLAND [5] which measured the disappearance of \(\overline{\nu } _{e}\), and accelerator experiments K2K [6] and MINOS [7] which measured the disappearance of \(\nu _{\mu }\) and \(\overline{\nu } _{\mu }\). These experiments additionally characterised the oscillation curve in the ratio of the distance travelled over the neutrino energy, L/E,  which governs the oscillation probability. The results can be summarised in a framework with three active neutrinos, where at least two neutrinos have non-zero mass. The flavour and mass eigenstates of the neutrinos, \(\left| \nu _l\right\rangle \) and \(\left| \nu _i\right\rangle \) respectively, are separate and can be related by a \(3\times 3\) unitary mixing matrix U as \(\left| \nu _l\right\rangle =U\left| \nu _i\right\rangle .\) The mixing matrix is the Pontecorvo–Maki–Nakagawa–Sakata (PMNS) matrix, which can be parametrised by three mixing angles, \(\theta _{12}\), \(\theta _{23}\), \(\theta _{13}\), and a CP-violating phase, \(\delta _{\scriptscriptstyle \textrm{CP}}\)  [8, 9]. The probabilities for neutrino flavour oscillations can then be expressed as functions of these mixing angles and the mass-squared differences, \(\varDelta m^2_{ij}=m^2_i-m^2_j\) where \(m_i\) is the mass of the ith neutrino mass eigenstate. The \(m_2>m_1\) ordering was established by measurements of solar neutrinos across multiple experiments [10]. The ordering of the remaining mass states is unknown, with \(m_3>m_2>m_1\) referred to as the normal ordering (NO), and \(m_2>m_1>m_3\) as the inverted ordering (IO). This analysis uses the Particle Data Group (PDG) [11] convention for the order of the mixing matrices, \(U=U_{23} \otimes U_{13} \otimes U_{12}.\)

The results from SK, SNO, and KamLAND showed that both \(\theta _{23}\) and \(\theta _{12}\) were non-zero. The last mixing angle, \(\theta _{13}\), was indicated to be non-zero through T2K’s \(2.5\sigma \) measurement of \(\nu _{\mu } \rightarrow \nu _{e} \) [12]. It was later precisely measured by short-baseline experiments Daya Bay [13], RENO [14], and Double Chooz [15], observing the disappearance of \(\overline{\nu } _{e}\) from nuclear reactors. The long-baseline accelerator experiments T2K and NOvA subsequently observed the appearance of \(\nu _{e}\) in a \(\nu _{\mu }\) beam at high significance [16, 17], and NOvA observed \(\overline{\nu } _{e}\) appearance in a \(\overline{\nu } _{\mu }\) beam at \(4.4\sigma \) [18]. The non-zero \(\theta _{13}\) mixing angle implies that a measurement of \(\delta _{\scriptscriptstyle \textrm{CP}}\) is possible in long-baseline accelerator-based experiments by measuring the appearance of \(\nu _{e}\) and \(\overline{\nu } _{e}\) in \(\nu _{\mu }\) and \(\overline{\nu } _{\mu }\) beams, respectively.

On its way to the FD, the beam passes through matter and the presence of electrons modifies the oscillation probabilities as compared to those in vacuum. Namely, charged-current elastic scattering on electrons is possible for \(\nu _{e}\) and \(\overline{\nu } _{e}\) (hereafter referred to as ), but not for the other flavours [19, 20]. The sign of the matter effect differs for neutrinos and anti-neutrinos, and the magnitude is a function of the density of electrons in the path of the neutrinos, \(n_e,\) the weak interaction coupling strength, \(G_F,\) and the neutrino energy, E. The matter effect in the sun was central to measuring \(m_2>m_1.\) The probability for appearance as a function of neutrino energy, E,  and baseline, L,  including a first-order approximation of the matter effects, is [21]

(1)

where

$$\begin{aligned}{} & {} \alpha = \varDelta m^2_{21} / \varDelta m^2_{31} \\{} & {} \varDelta _{ij}= \varDelta m^2_{ij} L / 4 E\\{} & {} A=(-) 2\sqrt{2} G_F n_e E/ \varDelta m^2_{31}\\{} & {} J_0 = \sin 2 \theta _{12}\sin 2 \theta _{13}\sin 2 \theta _{23}\cos \theta _{13}. \end{aligned}$$

The first term in Eq. 1 is proportional to \(\sin ^2\theta _{23}\), which renders the appearance sensitive to whether \(\theta _{23}\) is above or below \(\pi /4,\) referred to as the octant of \(\theta _{23}\). This in turn determines whether the \(\nu _3\) mass eigenstate has a larger admixture of \(\nu _{\mu }\) or \(\nu _{\tau }\). The term containing \(\sin \delta _{\scriptscriptstyle \textrm{CP}} \) in Eq. 1 has the opposite sign for neutrinos and anti-neutrinos, and allows for CP symmetry violation if \(\delta _{\scriptscriptstyle \textrm{CP}}\) is different from 0 or \(\pi .\) The term containing \(\cos \delta _{\scriptscriptstyle \textrm{CP}} \) does not violate CP symmetry, but can change the shape of the appearance energy spectrum, and is important for precisely measuring \(\delta _{\scriptscriptstyle \textrm{CP}}\). In T2K, the term proportional to \(\sin \delta _{\scriptscriptstyle \textrm{CP}} \) can change the appearance probability by as much as \(\pm 30\%\) given the current knowledge of the other mixing angles. \(J=J_0\sin \delta _{\scriptscriptstyle \textrm{CP}} \) is referred to as the Jarlskog invariant [22, 23] and is a basis-independent measure of the CP-violation. This analysis presents T2K’s constraints on \(\varDelta {}m^2_{32}\), \(\sin ^2\theta _{23}\), \(\sin ^2\theta _{13}\), \(\delta _{\scriptscriptstyle \textrm{CP}}\), J,  and the mass ordering.

2 The T2K experiment

To measure \(\delta _{\scriptscriptstyle \textrm{CP}}\) and the other oscillation parameters, T2K uses a beamline that produces predominantly muon-flavoured neutrinos or anti-neutrinos with a peak energy of \(E_{\nu }\approx 0.6~\text {GeV},\) and has been alternating between neutrino and anti-neutrino configurations since 2014. A suite of near detectors (NDs), approximately 280 m from the beam production target, characterise T2K’s neutrino beam before long-baseline oscillations become likely. The far detector (FD) is 295 km away and measures the appearance of and the disappearance of in the -dominated beam. The rate and directional stability of the neutrino beam are measured by the on-axis neutrino ND, INGRID. The second ND, ND280, and the FD, SK, are \(2.5^{\circ }\) off-axis with respect to the upstream proton beam that impinges on the neutrino production target. By being placed off-axis, the detectors sample a narrower neutrino energy distribution, peaking near the maximum of the appearance spectrum.

2.1 Beamline

The T2K neutrino beam is produced at the Japan Proton Accelerator Research Complex (J-PARC) in Tokai, Ibaraki, by a high-intensity proton beam, incident on a production target [24]. At J-PARC, \(\text {H}^-\) ions from an ion source are accelerated to an energy of 400 MeV in a linear accelerator. Charge-stripping foils convert the beam to \(\text {H}^+\) at injection into the rapid-cycling synchrotron, which accelerates the proton beam to 3 GeV. These protons are then injected into the main ring (MR) synchrotron, where they are accelerated to 30 GeV. The proton beam from the MR consists of eight bunches with width \(\sim 80~\text {ns}\) \((3\sigma ),\) referred to as a “spill”, produced every 2.48 s.

Fig. 1
figure 1

The protons on target (POT) delivered to T2K by the MR over time, with the beam intensity overlaid. The ND280 analysis uses runs 2 to 9, and the INGRID and FD analyses use runs 1 to 10, with run-by-run POT listed in Table 1

The protons are extracted from the MR to the neutrino beamline, which consists of a series of normal- and super-conducting magnets that are used to bend the proton beam in the direction of the FD, and to focus the beam onto the neutrino production target. The proton beam power, as well as the position, angle, and size of the proton beam at the target, are precisely measured by a series of proton beam monitors [24, 25] installed along the neutrino beamline.

The 30 GeV protons strike a 91.4 cm-long monolithic graphite target installed in the first of three electromagnetic focusing horns. Outgoing charged pions and kaons are focused by these horns, which have been operating at a current of \(\pm 250~\text {kA}\) for nearly the full T2K run to date. The polarity of the horns can be set to focus either positively or negatively charged outgoing particles, and a 96 m-long decay volume is located directly downstream of the focusing system. Positively charged pions decay into positively charged muons and muon neutrinos, whilst negatively charged pions decay into negatively charged muons and muon anti-neutrinos. The former is referred to as \(\nu \)-mode and the latter as \(\overline{\nu }\)-mode. Kaon and muon decays are the primary contributors to the \(\nu _{e}\) contamination in the \(\nu _{\mu }\)-dominated beam.

A beam dump is situated at the end of the decay volume and absorbs surviving hadrons. A muon monitor downstream of the beam dump, MUMON [26], measures the intensity and profile of muons that have more than 5 GeV of energy. This measurement is used as a proxy for stability of the associated neutrino beam. The predicted neutrino fluxes and uncertainties are described in detail in Sect. 4.

The MR proton beam power has reached a maximum of 515 kW, and the protons on target (POT) and power history are shown in Fig. 1. Scheduled upgrades will increase the beam power to 1.3 MW and operate the focusing horns at \(\pm 320\) kA current. This will significantly increase the POT per run cycle and provide more neutrinos at the ND and FD per POT. It will also reduce the \(\overline{\nu } _{\mu }\) and \(\nu _{\mu }\) backgrounds in \(\nu \)-mode and \(\overline{\nu }\)-mode  [27, 28], respectively, referred to as the wrong-sign component.

2.2 Near detectors

Two NDs are used directly in the oscillation analysis: the on-axis INGRID, and the off-axis ND280. Both detectors are housed in the same pit underground, with the centres of ND280 and INGRID approximately 24 m and 33 m, respectively, below the surface.

Fig. 2
figure 2

The INGRID on-axis ND, used to measure the neutrino beam profile and rate [29]. The beam direction is shown as into the paper

INGRID [29] is designed to measure the profile and stability of the neutrino beam. It samples the beam spill-by-spill with a transverse cross section of \(10\times 10~\text {m}^2\) with 14 identical modules arranged as a cross, as shown in Fig. 2. Each of the modules alternates iron target plates of 6.5 cm thickness with tracking scintillator planes of 1 cm thickness, for a total of 9 iron plates and 11 scintillator planes, and is surrounded by scintillator planes acting as vetoes. A module exposes a \(1.24\times 1.24~\text {m}^2\) area facing the beam, and provides a 7.1 t target mass. INGRID measures the beam direction with an accuracy higher than 0.4 mrad, within the required precision of \(\pm 1~\text {mrad}\) for the oscillation analysis.

Fig. 3
figure 3

The ND280 off-axis ND, used to measure the neutrino flux and interactions before long-baseline oscillations [24]. The detector coordinates and beam direction are superimposed, with the sub-detectors are labelled accordingly

ND280, hereafter referred to as the ND, is used to constrain the uncertainties on the neutrino flux and interactions in the analysis. It is a magnetised detector consisting of different sub-detectors as shown in Fig. 3. The ND measures \(5.6~\text {m} \times 6.1~\text {m} \times 7.6~\text {m}\) (width \(\times \) height \(\times \) length) around its outer edges including the magnet with the coordinate convention being z pointing along the nominal neutrino beam axis, with x and y being the horizontal and vertical directions, respectively. The refurbished magnet from the UA1 [30, 31] and NOMAD [32] experiments at CERN provides a magnetic field of 0.2 T, and the magnet yoke is instrumented with layers of plastic scintillator called the Side Muon Range Detector (SMRD) [33]. Inside the magnet enclosure there is an electromagnetic calorimeter (ECal) [34] surrounding the inner detector, which is used to distinguish track-like and shower-like objects, and is made of alternating layers of plastic scintillator and lead.

The inner detector region houses the \(\pi ^0\) detector (P\(\emptyset \)D) [35] in the upstream portion, which is made of alternating layers of water bags, brass sheets, and triangular \(x{-}y\) scintillator planes. The water bags can be filled with either water or air. The P\(\emptyset \)D has its own ECal modules upstream and downstream of the water target region, made from alternating scintillator planes and lead sheets. The P\(\emptyset \)D, ECal and SMRD also act as vetoes for interactions originating outside the detector, e.g. cosmic-ray muons and neutrino interactions in the sand upstream of the detector hall. Downstream in the direction of the FD, there are two Fine-Grained Detectors (FGDs) [36], which are each sandwiched by Time Projection Chambers (TPCs) [37]. These sub-detectors are together referred to as the “tracker”. The most upstream FGD (FGD1) is made of 15 polystyrene scintillator modules. One module is \(186.4~\text {cm}\times 186.4~\text {cm}\times 2.02~\text {cm}\) and consists of two scintillator layers oriented in x and y,  with each layer containing 192 9.6 mm wide square bars approximately 2 m long, which are read out at one end. The second FGD (FGD2) contains six passive water modules, each sandwiched by polystyrene scintillator modules identical to those in FGD1. The TPCs use a \(\text {Ar}{:}\text {CF}_4{:}i\text {C}_4\text {H}_{10}\) gas mixture in a 95:3:2 concentration, and have a space point resolution of approximately 1 mm.

This analysis selects interactions occurring in either FGD, using the FGDs and TPCs for track reconstruction and particle identification. The selection is detailed in Sect. 6.1. The FGDs are capable of tracking charged particles, performing particle identification, and calculating momentum-by-range for contained particles. The TPCs are three-dimensional trackers which measure momentum through the curvature of the tracks in the magnetic field, with a resolution of \(\frac{\delta p_{\perp }}{p_{\perp }} \sim 0.1 p_{\perp },\) where \(p_{\perp }\) is the momentum perpendicular to the magnetic field. The TPCs also provide excellent particle identification.

Table 1 Collected protons-on-target (POT) for each T2K run included in the analysis of T2K data at the ND and FD. The recorded POT at INGRID closely follows that of the FD

2.3 Far detector

The Super-Kamiokande (SK) detector [24, 38] is the far detector (FD) for T2K. SK is a large water Cherenkov detector located 295.3 km from the neutrino production target with a 2.7 km water-equivalent overburden. It is filled with 50 kt of ultrapure water that is optically separated into an inner detector, ID, which forms the primary target for neutrino interactions, and an outer detector, OD, which serves to veto external backgrounds.

The ID is instrumented with 11,129 inward-facing photomultiplier tubes (PMTs) with 20-inch diameter, providing a total photocathode coverage of 40%. The OD is instrumented with 1885 8-inch outward-facing PMTs, which are connected to wavelength shifting plates and are attached to the same stainless steel structure that houses the ID PMTs. The structure is offset 2 m from the wall of the OD and there is a 55 cm dead region between the ID and OD surfaces.

Charged particles are detected by their Cherenkov ring pattern, and events are classified by the number of primary rings, the ring pattern of each ring, and the number of time-delayed electron rings consistent with a muon decay, hereafter referred to as “Michel electrons”. This analysis selects single-ring (“1R”) events, where the ring is either electron-like (1R\(e\)) or muon-like (1R\(\mu \)), with a selection-dependent cut on the number of delayed Michel electrons (“d\(e\)”). The FD selections are detailed in Sect. 7.

The data used in this analysis were taken over two different periods of the SK detector operations and span the years 2010–2020, during what is referred to as the SK-IV period. Of the \(36.01\times 10^{20}\) POT reported here, \(31.29\times 10^{20}\) (runs 1–9) were collected in 2010–2018. In June 2018, SK detector operations were stopped for refurbishment in preparation for the gadolinium (Gd) loading of the water target for the SK-Gd project [39, 40]. During this work the detector surfaces were cleaned to remove rust and other impurities, detector walls were repaired to fix minor leaks, and failed PMTs were replaced in the ID and OD. This SK detector period is referred to as SK-V.

SK-V resumed data taking in January 2019 with ultrapure water and collected \(4.73\times 10^{20}\) POT during October 2019–February 2020 (run 10). These data were collected entirely in \(\nu \)-mode, resulting in a total of \(19.66\times 10^{20}\) and \(16.34\times 10^{20}\) POT available for analysis in the \(\nu _{\mu }\) and \(\overline{\nu } _{\mu }\) modes, respectively. For a detailed breakdown of the POT in each run period, consult Table 1. Gadolinium loading commenced in July 2020, and this analysis does not include such data.

Table 2 Percentage of hadronic interactions in the target and downstream beam line for which external measurements are used in the tuning or uncertainty evaluation. The interactions are weighted by their contribution to the neutrino flux at the FD, separated into different horn focusing modes and neutrino flavours

3 Updates from previous analysis

This section provides an overview of the improvements to T2K’s previously published oscillation analysis [1, 2], which are detailed in the subsequent sections.

  • Data at the FD: The data at INGRID and the FD increased by \(4.73\times 10^{20}\) POT (+33%) in \(\nu \)-mode, increasing the overall amount by 15%, detailed in Sect. 7.

  • Data at the ND: The data at the ND increased by \(5.73\times 10^{20}\) POT (+99%) in \(\nu \)-mode, and by \(4.48\times 10^{20}\) POT (+116%) in \(\overline{\nu }\)-mode, increasing the overall amount by 106%, detailed in Sect. 6.

  • Selections at the ND: The increased data allowed for refining the \(\overline{\nu }\)-mode selections and re-binning all existing selections, improving the constraints on the systematic uncertainties from the ND in the oscillation analysis, detailed in Sect. 6.1.

  • FD reprocessing: An updated model for the dark rate and gain drift in the PMTs had a slight impact on the reconstruction and the number of observed data events. The processing introduced one more \(\overline{\nu }\)-mode electron-like event, and three fewer \(\overline{\nu }\)-mode muon-like events, and had no overall effect on the \(\nu \)-mode samples, detailed in Sect. 7.

  • Neutrino flux model: The neutrino flux was constrained using charged pion production data on a replica of the T2K production target from NA61/SHINE [41]. Data on a thin target [42] was also used when appropriate. This reduced the flux uncertainties before the ND analysis from \(\sim 9\%\) down to \(\sim 5\%\) in the neutrino flux peak, detailed in Sect. 4.

  • Neutrino interaction model: Several changes to the neutrino interaction model were made. The largest changes were switching to a more sophisticated spectral-function based nuclear model [43] for charged-current quasi-elastic (CCQE) interactions, introducing an additional uncertainty due to nuclear effects in the four-momentum transferred to the nucleus \((Q^2),\) and adding an uncertainty for the nucleon removal energy. The nuclear-cascade model for pions was tuned to external data [44], and the FD parametrisation was constrained by the fit to ND data, whereas it was previously allowed to vary separately. The interaction model for pions re-scattering within the detector at the ND and FD were unified, and is identical to the pion final-state interaction model, detailed in Sect. 5. However, constraints of e-scattering within the ND were not propagated to re-scattering at the FD, as the uncertainties were kept uncorrelated.

4 Neutrino flux model

This is the first T2K oscillation analysis to use hadron production measurements made on a replica of the T2K target by the NA61/SHINE experiment at CERN [41]. The method for predicting the neutrino flux and propagating the associated uncertainties remains the same as in previous results [1, 2, 45]. FLUKA 2011.2x [46, 47] is used to simulate interactions inside the target. The outgoing particles from the target, which later decay to neutrinos, are tracked through the horn field using the GEANT3-based JNUBEAM package [48].

The prediction for pions exiting the target’s surface are tuned to \(\pi ^+\) and \(\pi ^-\) yields measured by the NA61/SHINE experiment, using data collected in 2009 with a replica of the T2K production target [41]. Pions that leave the target and are within the phase space covered by the replica target data, which is about 90% of the neutrinos at the flux peak, are given a weight

$$\begin{aligned} w(p,\theta ,z,i) = \frac{{\textrm{d}}n^{{\textrm{NA}}61}(p,\theta ,z,i)}{{\textrm{d}}n^{\textrm{MC}}(p,\theta ,z,i)} \end{aligned}$$
(2)

where \({\textrm{d}}n\) is the POT-normalised differential yield for data (“NA61”) and simulation Monte-Carlo (“MC”), with exiting momentum p,  polar angle \(\theta ,\) and longitudinal position z along the target for an exiting particle of type \(i = \{\pi ^+,\pi ^-\}.\) For the particles leaving the target, no additional tuning weight is applied for any of the interactions or trajectories inside the target. Simulations for particles that are not covered by the replica target data, and interactions occurring outside the target, are tuned to NA61/SHINE data on \(\pi ^{\pm },\) \(K^{\pm },\) \(K^0_s,\) \(\Lambda ,\) and p yields from a thin target taken in 2009 [42], and other external measurements, applying the same method as previous T2K analyses [45]. The percentage of hadronic interactions which are tuned by external data is shown in Table 2.

In the previous thin-target tuning, a large uncertainty on the cross section of proton production was assigned. In the replica-target based tuning, this uncertainty is no longer necessary for particles covered by the replica target data, because the exiting particle yields can be tuned directly without referring to the interaction history inside the target. The uncertainties from NA61/SHINE are then incorporated with the uncertainties associated with the proton beam profile and out-of-target interactions to give the total uncertainty.

Fig. 4
figure 4

The predicted unoscillated neutrino fluxes at the FD in \(\nu \)-mode (top) and \(\overline{\nu }\)-mode (bottom). The \(\nu _{e}\) and \(\overline{\nu } _{e}\) components are scaled by \(\times 100.\) The solid lines show the predictions after tuning to NA61/SHINE data on the T2K replica target, and the dotted grey lines show the predictions in the previous T2K analysis [1, 2], tuned to thin target hadron production data. The bottom inset shows the ratio of the flux from the replica target tuning to the flux from the thin target tuning

Fig. 5
figure 5

The predicted unoscillated neutrino fluxes at the FD in \(\nu \)-mode (top) and \(\overline{\nu }\)-mode (bottom) in logarithmic scale with an extended \(E_{\nu }\) range, after the tuning to NA61/SHINE data on the T2K replica target

Fig. 6
figure 6

Uncertainty on the right-sign flux in \(\nu \)-mode (top) and right- (middle) and wrong-sign (bottom) fluxes in \(\overline{\nu }\)-mode, broken down by the sources of uncertainty. The solid black line shows the total flux uncertainty in this analysis, and the dashed black line shows the total uncertainty for the previous T2K analysis [1, 2], which used NA61/SHINE thin target data. The grey shaded region shows the shape of the neutrino flux

For the unconstrained interactions not covered by thin- or replica-target data, a systematic uncertainty is calculated by dividing the kinematic phase space parametrised by Feynman-\(x_{\textrm{F}}\) and transverse momentum, \(p_{\textrm{T}},\) into six regions. A 50% fully correlated normalisation uncertainty and a 50% shape uncertainty uncorrelated between the regions is assigned. The size of the uncertainty is motivated by comparing the hadron interaction models in FLUKA 2011.2c [46, 47] and the GEANT 4.10.03 [49] FTFP_BERT and FTF_BIC physics lists.

The predicted flux distributions are provided in Ref. [50] and are shown for the FD in Fig. 4. The largest difference compared to the previous neutrino flux prediction is the reduction of the \(\nu _{\mu }\) component in \(\nu \)-mode, and the \(\overline{\nu } _{\mu }\) component in \(\overline{\nu }\)-mode (“right-sign”), by 5–10% around the flux peak. Due to the large uncertainty on the hadron interactions in the previous tuning, this difference was covered by the flux uncertainties. To more clearly see wrong-sign and background contributions, the predicted neutrino flux spectra are also shown in logarithmic scale and for a wider range of energies in Fig. 5.

Overall, tuning with the NA61/SHINE 2009 replica target data reduces the uncertainty from 9 to 5% near the flux peak, as shown in Fig. 6. In future T2K analyses, outgoing kaons will also be tuned using NA61/SHINE T2K replica target data from 2010, published in 2019 [51]. This will reduce the flux uncertainty at higher energies to \(\sim 5\%.\) With a reduced uncertainty contribution from hadron production errors, uncertainties coming from other sources are now dominant in some energy regions. In particular, uncertainties on the proton beam profile and neutrino beam off-axis angle significantly contribute to the uncertainty on the high-energy edge of the flux peak, since the width of the energy spectrum is directly affected by shifts in the off-axis angle.

5 Neutrino interaction model

Measurements of neutrino oscillations at T2K rely on comparing the neutrino interaction rates at the ND and the FD as a function of the incoming neutrino energy and flavour. These are determined from the observed products of neutrinos interacting with the nuclei inside the detectors, which requires a model to translate what is observed in the detector to information about the neutrino that interacted. Neutrino interaction uncertainties impact the oscillation analysis by changing the expected rate of neutrino interactions, altering the accuracy of the neutrino energy reconstruction, and complicating the extrapolation of model constraints from the ND to the FD. More details can be found in Refs. [1, 52,53,54].

The neutrino interaction model has been significantly improved since the last analysis [1]. This section first provides an overview of the components of the model and then discusses the associated uncertainties and their parametrisations. As briefly mentioned in Sect. 2 and detailed further in Sects. 6.1 and 7, this analysis selects charged-current (CC) neutrino interaction events and has no dedicated neutral-current (NC) selections. The oscillation analysis at the FD specifically selects single-ring events and the model focuses on the treatment of such interactions. In these interactions, CCQE and 2p2h are the main contributors and are discussed next. Neutrino interactions in which a single pion is produced and the pion is missed – either due to its kinematics or by it being absorbed in the nuclear medium – are also an important contributor.

5.1 Base interaction model

Simulations of neutrino interactions are performed with version 5.4.0 of the NEUT neutrino-nucleus interaction event generator [55,56,57]. NEUT takes inputs from a variety of theoretical models for separate neutrino interaction channels. The total cross sections for each channel as a function of neutrino energy, overlaid on the T2K oscillated and unoscillated muon neutrino fluxes, are shown in Fig. 7. An overview of the channels most relevant to this analysis is presented below.

Fig. 7
figure 7

Neutrino cross sections for muon neutrinos interacting on a water target in NEUT, broken down by interaction mode as a function of neutrino energy. The predictions have been modified from their default to reflect the input model used in the oscillation analysis. The surviving muon neutrino flux as seen by the FD is shown with a white line, and the unoscillated muon neutrino flux as seen by the ND is shown as the grey shaded region. The figure is adapted from Ref. [55]

5.1.1 1p1h

One-particle one-hole (1p1h) interactions describe charged-current quasi-elastic (CCQE) and neutral-current elastic (NCE) neutrino interactions in which a single nucleon from inside a target nucleus is ejected. CCQE interactions, which usually produce single-ring electron-like or muon-like events, are the dominant contributor to the FD event samples, making up roughly 70% of the 1R\(\mu \) selection. In NEUT, 1p1h interactions are modelled according to the scheme presented in Refs. [43, 55], sometimes referred to as the “Benhar Spectral Function” model. This approach relies on the plane wave impulse approximation to factorise the 1p1h cross-section calculation into an expression containing a single-nucleon factor alongside a spectral function (SF). The SF is a two-dimensional distribution describing the probability of finding a nucleon with momentum, \(|{\textbf{p}}|,\) and removal energy, \(E_{rmv},\) which corresponds to the energy required to remove the nucleon from the nuclear potential. This formalism provides a realistic description of the nuclear ground state and is built largely from exclusive measurements of 1p1h interactions in electron scattering, with additional theory-based contributions to describe the role of initial-state correlations between neighbouring nucleons. As an example, the two-dimensional SF for oxygen is shown in Fig. 8, which exhibits the shell structure of the nucleus.

Fig. 8
figure 8

The two-dimensional probability density distribution for the spectral function for oxygen in NEUT [43] (left), and the projection onto the removal energy axis (right). On the left, the darker colour represents a higher probability of finding an initial-state nucleon with a particular removal energy and momentum. The two sharp p-shells at \(E_{rmv}\sim 12~\text {MeV}\) and \(E_{rmv}\sim 18~\text {MeV},\) and the larger diffuse s-shell at \(E_{rmv}\sim 20{-}65~\text {MeV}\) and \(|{\textbf {p}}|<100~\text {MeV/c},\) are visible. The predictions for the shell positions from another model [58] are overlaid on the right with dashed lines, for protons (red) and neutrons (blue). The energy in MeV is labelled for each prediction

The single-nucleon component of the 1p1h cross section uses the BBBA05 [59] description for the vector part of the nucleon form factors, and a simple dipole form for the axial part. The nucleon axial mass parameter appearing in the form factor, \(M_A^{QE},\) is constrained using bubble chamber measurements of neutrino interactions on light nuclear targets, as detailed later in Sect. 5.2.

5.1.2 2p2h

In two-particle two-hole (2p2h) interactions, a neutrino interacts with a correlated pair of nucleons, ejecting both from the nucleus. Although this is not a dominant process at T2K, it usually produces single-ring electron-like or muon-like events in the FD – making up about 12% of the 1R\(\mu \) selection at the FD – and is therefore important to the oscillation analysis. As T2K’s neutrino energy estimator is based on the assumption that the interaction was CCQE, applying it to 2p2h events causes a natural bias. Thus it is crucial that the relative contribution of 2p2h events to the selections, and the bias they cause to the neutrino energy estimator, are well modelled. NEUT describes the charged-current 2p2h cross section and outgoing lepton kinematics with the Nieves et al. model [60]. In this model, the 2p2h cross section peaks in two distinct regions of momentum and energy transfer, referred to as “\(\varDelta \)” and “non-\(\varDelta \)” excitation regions, which each cause distinctly different biases in neutrino energy reconstruction [1]. Neutral-current 2p2h interactions are not simulated in NEUT. Their inclusion would have a negligible impact on the oscillation analysis as such interactions would make a small contribution to an already small NC background, which is prescribed large uncertainties.

5.1.3 Single-pion production

Single-pion production (SPP) processes are the dominant contributor for the T2K FD sample that requires a single electron-like ring with one delayed decay electron (referred to as 1R\(e\)1d\(e\) in Sect. 7). The events also contribute to the other event samples when the pion is not observed due to interactions in the detector or the nucleus, or due to reconstruction inefficiencies. SPP at T2K stems mostly from the neutrino-induced excitation of an initial-state nucleon to a baryon resonance that decays into a pion and a nucleon, and makes up about 13% of the 1R\(\mu \) selection. These processes are described in NEUT by the Rein–Sehgal (RS) model [61] in the outgoing hadronic mass region \(W<2.0~\text {GeV},\) with additional improvements to the nucleon axial form factors [62, 63] and the inclusion of the final-state lepton mass in the calculation [64,65,66]. Whilst \(\varDelta (1232)\) excitations are the dominant contributors to the SPP cross section, a total of 18 baryonic resonances are included in addition to a non-resonant process in the mixed isospin channels. Interference between the resonances is incorporated, but not between the resonant and non-resonant components. The initial-state model for SPP interactions in NEUT is a simple relativistic Fermi gas.

Coherent scattering off nuclei also contributes to the SPP cross section, especially at low four-momentum transfer. In this analysis, NEUT models coherent interactions with the Berger–Sehgal model [67], updated from the RS model [68], and includes Rein’s model of diffractive pion production [69].

5.1.4 Deep inelastic scattering

Deep inelastic scattering (DIS) describes neutrino interactions with the quark constituents of nucleons. It is a sub-dominant process in T2K’s oscillation analysis due to the neutrino energy and the single-ring event selections at the FD. The cross section in NEUT is calculated using the GRV98 [70] Parton Distribution Functions (PDFs), which describe the probability to find a quark of a given type with a given value of the Bjorken scaling variables, x and y,  inside the target nucleon. Bodek–Yang (BY) modifications [71, 72] are made to extend the validity of this approach to the relatively low four-momentum transfers, \(Q^2\lesssim 1.5~\text {GeV}^2,\) typical for interactions at T2K.

In NEUT, the modelling of DIS processes begins for interactions where the hadronic invariant mass \(W>1.3~\text {GeV}.\) To avoid double counting the aforementioned non-resonant single-pion production, only DIS interactions that produce more than one pion in the final state are considered. The generation of the hadronic state is split depending on W: for interactions with \(W>2~\text {GeV}\) PYTHIA 5.72 [73] is used, whilst for \(W<2~\text {GeV}\) a custom model interpolating between the \(\varDelta (1232)\) and DIS interactions is employed, described in Sec.V C of Ref. [74].

5.1.5 Final-state interactions

The simulated neutrino interaction events produce an outgoing hadronic system at the interaction vertex inside the nucleus, in addition to the outgoing lepton. These hadrons can undergo final-state interactions (FSI) in the nuclear medium. In NEUT, pion FSI are described using the semi-classical intranuclear cascade model by Salcedo and Oset [75, 76], tuned to modern \(\pi -A\) scattering data [44]. Nucleon FSI are described in an analogous cascade model [56]. Within the cascade, the outgoing hadrons are individually stepped through the remnant nucleus where they can elastically scatter, be re-absorbed, undergo charge-exchange processes, and/or emit additional hadrons which are also stepped through the cascade. Amongst other effects, such cascades allow for SPP events to have no observable pions in the final state after FSI, and for 1p1h interactions to appear as pion production interactions.

5.1.6 Coulomb corrections

Following a charged-current neutrino interaction, the electrostatic interaction between the remnant nucleus and the outgoing charged lepton can cause a small shift in the lepton’s momentum. The size of this Coulomb correction has been determined from the analysis of electron scattering data [77] and is implemented as a small nucleus and lepton-flavour dependent shift in the momentum of the outgoing lepton. The size of this shift is \(-3.6~\text {MeV}\) \((+2.6~\text {MeV})\) for outgoing \(\mu ^-\) \((\mu ^+)\) from a carbon target, and \(-4.3~\text {MeV}\) \((+3.3~\text {MeV})\) for outgoing \(\mu ^-\) \((\mu ^+)\) from an oxygen target.

Table 3 The parameters included in the 1p1h uncertainty model with their values and uncertainties before the ND analysis. The uncertainties for the removal energy parameters are around their central value and contain the carbon–oxygen and \(\nu \)\(\overline{\nu } \) correlations described in the text. The first five \(Q^2\) parameters are not externally constrained before the analysis, and are free to vary between 0 and \(\infty .\) The units of the \(Q^2\) ranges are \(\text {GeV}^2\)

5.2 Uncertainty parametrisation

Mismodelling of neutrino interactions can bias the measurements of oscillation parameters – for instance attributing an increase in single-ring events to an increase in 2p2h interactions instead of CCQE interactions. It is crucial to evaluate the impact that plausible variations of NEUT’s interaction model can have on the neutrino oscillation analysis. This section describes the chosen parametrisation of such variations and the corresponding parameters’ uncertainties. When possible, theory-driven uncertainties are used, but in many cases this offers insufficient freedom to describe available data, and additional empirically driven parameters are required. To cover the caveats of such an approach, and to consider plausible model variations not included in the model parametrisation, a variety of simulated data studies are performed. These are detailed in Sect. 5.3, and applied to the oscillation analysis in Sect. 9 and Appendix B.

5.2.1 1p1h uncertainties

The 1p1h uncertainty model is split into three categories: removal energy related to the initial state described by the SF, the neutrino-nucleon interaction, and ad hoc freedoms in \(Q^2\) from nuclear effects, amongst others, inspired by external data. The central values and uncertainties are summarised in Table 3.

Removal energy: A mismodelling of nucleon removal energy would directly bias the reconstructed neutrino energy, which would subsequently bias the extraction of the neutrino oscillation parameters, notably \(\varDelta m^2.\) This was identified as a leading source of uncertainty in a simulated-data study in the last T2K oscillation analysis [1, 2]. In this analysis, a more reliable modelling of removal energy with accompanying uncertainties was developed.

Unlike the simplistic Fermi-gas models used in the previous iterations of T2K’s neutrino oscillation analyses, the SF model does not have a single fixed value for the nuclear binding energy that can be varied as a parameter. Instead, the SF removal energy distribution, extracted largely from exclusive electron scattering data, reflects the shell structure of the nucleus, shown earlier in Fig. 8. The positions of the removal energy peaks, used as an input to the SF model, are measured with a resolution of \(2{-}6~\text {MeV}\) [78] and lower [79]. Measurements of the peak positions for carbon differ by up to 2 MeV for the s-shell and 6 MeV for the p-shell [58]. The relative strength of each peak also has an uncertainty of up to 10% for carbon [43, 80]. To extract a SF from \((e,e^{\prime } p)\) data, the impact of nuclear effects such as FSI must be incorporated, and an uncertainty of 5 MeV in this correction is applied [58]. In view of these uncertainties, a global removal energy shift uncertainty of 6 MeV is included in the analysis alongside a 3 MeV uncertainty on the difference between the carbon and oxygen removal energies. Further uncertainties are accounted for by the introduction of parameters that allow freedom as a function of \(Q^2,\) described in more detail below.

The construction of the SF from \((e,e^{\prime } p)\) data, and the associated uncertainties, can only be directly applied to modelling 1p1h neutrino interactions with initial-state protons, i.e. anti-neutrino CCQE interactions. The SF for initial-state neutrons cannot be directly constrained in the same way and the implementation in NEUT assumes that protons and neutrons have the same removal energy distributions. However, as can be seen in Fig. 8, nuclear shell models predict that this is not the case. Calculations suggest that proton and neutron ground states differ in their removal energy by \(1{-}4~\text {MeV},\) depending on the shell and target [58]. For the sharper p-shells, where an energy shift is more important relative to the width of the shell, the offset between the SF and the model calculations for neutrons is around 4 MeV for oxygen and 2 MeV for carbon. To account for this, the central value removal energies of the SF for neutrino interactions are shifted by these amounts, and an uncertainty of 4 MeV is applied on the difference between neutrino and anti-neutrino removal energies.

The removal energy shifts are encoded in four parameters depending on whether they affect initial-state protons (\(\overline{\nu }\) CCQE interactions) or neutrons (\(\nu \) CCQE interactions), and whether the target is carbon or oxygen: \(\varDelta E_{rmv}^{\nu O},\) \(\varDelta E_{rmv}^{\overline{\nu }O},\) \(\varDelta E_{rmv}^{\nu C},\) \(\varDelta E_{rmv}^{\overline{\nu }C}.\) The removal energy parameters shift a CCQE event’s outgoing lepton momentum and depends on the event’s lepton kinematics, neutrino energy, and neutrino flavour.

“Low \(Q^2\)parameters: NEUT’s cross section for charged-current interactions leaving no mesons in the final state (CC0\(\pi \)) interactions must be suppressed at low \(Q^2\) to match recent measurements from MINERvA  [81, 82] and T2K [83, 84]. This is often applied as a suppression of the CCQE cross section via the inclusion of a nuclear screening effect using the Random Phase Approximation (RPA) [60]. However, such effects are not included in the SF CCQE model used in this analysis. Since the SF model is built largely on the impulse approximation – which is expected to break down at low momentum transfers \(\lesssim 400~\text {MeV}/c\) [54] – extra uncertainties are added in the region where discrepancies with measurements are observed.

The low \(Q^2\) suppression is implemented as five parameters which alter the normalisation of the CCQE cross section in a particular \(Q^2\) range. The parameters span \(Q^2=\{0,0.25\}~\text {GeV}^2\) and are split into sub-ranges of \(0.05~\text {GeV}^2.\) Since the origin of this low \(Q^2\) suppression in SF predictions is poorly understood, these parameters do not have an external constraint. Whilst this free parametrisation is effective at facilitating a ND-driven modification to the CCQE cross section, the lack of a theoretical basis limits the model’s overall predictive power. Several simulated data studies are therefore discussed in Sect. 5.3 to evaluate the bias from this technique in the extraction of neutrino oscillation parameters.

\(M_A^{QE}\) and “high \(Q^2\)parameters: The nucleon axial mass, \(M_A^{QE},\) is tuned to neutrino-deuterium scattering data in NUISANCE [85]. CCQE cross-section data from ANL [86, 87], BNL [88], BEBC [89], and FNAL [90] is used, and deuterium nuclear effects [91] and flux uncertainties for ANL and BNL are included. The central value and its uncertainty are adjusted and inflated to cover the result and previous global fit results [92], giving \(M_A^{QE}=1.03\pm 0.06~\text {GeV}.\)

Uncertainties on the higher \(Q^2>0.25~\text {GeV}^2\) predictions of the SF model are driven by the axial component of the neutrino-nucleon interaction, where the dipole model may be inadequate [93]. An additional three “high \(Q^2\)” parameters are added to allow an ad hoc freedom, with the goal of lessening the extent to which \(M_A^{QE}\) is used as an effective parameter to correct for deviations from the dipole model. The \(Q^2\) ranges and uncertainties of the new high \(Q^2\) parameters are based on comparisons of the \(Q^2\) shape of the dipole and z-expansion models [93].

5.2.2 2p2h uncertainties

The uncertainties related to 2p2h interactions are similar to those in T2K’s previous oscillation analysis [1, 2]. Parameters altering the 2p2h normalisation independently for neutrinos and anti-neutrinos, and for carbon and oxygen interactions, are used. The 2p2h normalisations are unconstrained, and the carbon–oxygen scaling parameter has a 20% prior uncertainty. A separate shape uncertainty is also applied, which allows shifts in the \(\varDelta \) and non-\(\varDelta \) contributions in the energy and momentum transfer to the nucleus, \((q_0,|{{\textbf {q}}}|),\) of the Nieves model, also separated for carbon and oxygen interactions.

This analysis also includes additional new uncertainties that reflect the shape of the energy dependence of 2p2h using three different plausible models of the process, also studied by T2K cross-section analyses [94, 95]. The uncertainties span the maximal difference in 2p2h predictions from Martini et al. [96], Nieves et al.  [60], and SuSAv2 [97, 98], shown in Fig. 9. Four parameters are added which control the shape of the energy dependence of 2p2h below and above \(E_{\nu }=600~\text {MeV},\) and are separately applied to neutrino and anti-neutrino events.

Fig. 9
figure 9

Cross-section predictions for \(\nu _{\mu }\) (solid) and \(\overline{\nu } _{\mu }\) (dashed) 2p2h interactions on \(^{12}{\text {C}}\) from Martini et al.  [96], Nieves et al.  [60], and SuSA v2 [97, 98]

5.2.3 Single-pion production uncertainties

The uncertainty treatment for SPP remains almost identical to previous T2K analyses [1, 2, 99]. There are three central parameters in the modified RS model: the resonant axial mass, \(M_A^{RES};\) the value of the axial form factor at zero transferred four-momentum, \(C_5^{A}(Q^2=0);\) and the normalisation of the \(I_{1/2}\) non-resonant component. As for \(M_A^{QE},\) the parameters have been tuned to deuterium bubble chamber data using NUISANCE [85], selecting SPP data from ANL [100, 101] and BNL [102, 103], including corrected data [104]. The uncertainties are inflated so that the model adequately describes the SPP cross section in different hadronic mass regions from ANL and BNL, and SPP cross-section measurements on nuclear targets from MiniBooNE  [105,106,107] and MINERvA  [108,109,110,111].

A new parameter was introduced for anti-neutrino interactions producing low momentum pions, which constitute a background for the single-ring \(\overline{\nu }\)-mode samples. This extra freedom is added through an \(I_{1/2}\) non-resonant normalisation parameter that affects both \(\overline{\nu } _{\mu }\) and \(\overline{\nu } _{e}\) single pion interactions with \(p_\pi <200~\text {MeV}/c\) in the Rein–Sehgal model. The parameter is not constrained by the ND and has an uncertainty of 100%.

Normalisation parameters on the CC and NC coherent cross sections are included separately, and each is assigned an uncorrelated 30% uncertainty based on comparisons to MINERvA data [112]. The uncertainty on coherent scattering is fully correlated between carbon and oxygen.

5.2.4 Deep inelastic scattering uncertainties

DIS interactions make a small contribution to the samples in this oscillation analysis due to T2K’s neutrino energy. Nevertheless, uncertainties that cover variations in muon kinematics from CC DIS interactions are needed for the ND fit, whose selections contain some multi-\(\pi \) events, and have been significantly updated from previous analyses [99].

As discussed in Sect. 5.1, NEUT uses PDFs with BY corrections to calculate the DIS cross section. The uncertainty in the BY corrections is parametrised as a fraction of the difference between using the GRV98 PDFs with and without the BY corrections. At \(Q^2>1.5~\text {GeV}^2\) the impact is marginal, but in the peak region at lower \(Q^2\) the impact is large, altering the predicted cross section by \(\sim 40\%.\) This parameter is split for \(W < 2~\text {GeV}\) (multi-\(\pi \)) and \(W>2~\text {GeV}\) (DIS) interactions.

Another parameter is introduced to modify the generation of the hadronic state for \(W < 2~\text {GeV}\) DIS interactions, which uses a custom model [55] to choose the particle multiplicities in an event. This parameter accounts for the differences between the custom model and the AGKY model [113] used in the GENIE event generator [114].

Two normalisation uncertainties are also included, motivated by comparing the NEUT CC-inclusive cross section to the world average of measurements at higher neutrino energies [11]. The uncertainties are \(3.5\%\) for neutrino interactions and \(6.5\%\) for anti-neutrino interactions, and the two are uncorrelated.

5.2.5 Final-state interactions uncertainties

The NEUT pion cascade model has been tuned to better match external \(\pi -A\) scattering data [115]. The tuning procedure constrains the probability for different interaction processes to occur in the pion cascade (e.g. pion absorption or charge exchange), and is notably more robust than previous parametrisations. The constraints on the pion FSI cascade from the ND analysis are propagated to the FD in this analysis, which was not done before. Furthermore, the simulations at the ND and the FD now use a consistent model for pions from the interaction vertex propagating through the nucleus (“pion final-state interactions”), and for pions propagating through the detector (“pion secondary interactions”), mentioned later in Sect. 6.2. The ND constraint on the FSI parameters is only used to constrain the FD modelling of FSI and not the FD modelling of secondary interactions.

5.2.6 Other uncertainties

Additional uncertainties are applied to processes with small contributions to the analysis. As in previous analyses, the NC\(1\gamma \) production cross section has a 100% normalisation uncertainty. The NC elastic, NC resonant kaon and eta production, and NC DIS interactions are grouped together and referred to as “NC other” interactions, which have a 30% normalisation uncertainty that is uncorrelated at ND and FD. There is one uncertainty controlling the normalisation of the electron neutrino cross section, and another controlling the electron anti-neutrino cross section. The uncertainties are composed of two parts: one 2% uncorrelated part and one 2% anti-correlated part, which connects the two parameters [116]. The parameters only affect electron (anti-)neutrino interactions, and have no effect on the other neutrino flavours. The total cross sections of CC resonant single-photon production, CC resonant kaon production, CC resonant eta production, and CC diffractive pion production are controlled by a single new parameter referred to as “CC misc”, which is a 100% normalisation uncertainty, and such interactions are not affected by other model parameters. Two new parameters are included to account for Coulomb corrections [117, 118]. They control the normalisation of the (anti-)neutrino cross section for \(E_{\nu }=0.4{-}0.6~\text {GeV}\) with a 2%(1%) uncertainty, and are 100% anti-correlated.

5.3 Simulated data studies

The systematic uncertainties in the analysis are constructed to account for known uncertainties in neutrino interaction physics, but can not possibly cover every model scenario. For instance, cross-section measurements from T2K and other experiments have shown that no single 1p1h model describes the kinematic phase space in T2K and MINERvA [81, 94, 95, 119,120,121,122]. In addition, the ND analysis, presented later in Sect. 6, may compensate for cross-section mis-modelling by varying the flux parameters instead of the cross-section parameters, leading to good agreement with the observed event spectrum in lepton kinematics. However, the fitted model may scale the effect incorrectly in other important physics variables, e.g. \(E_{\nu }.\) It is therefore crucial to test whether the uncertainty model is flexible enough to capture variations under alternative cross-section models which are not directly implemented in the default uncertainty model, and whether the subsequent extrapolation of model constraints to the FD has an effect on constraining the oscillation parameters.

Some of the simulated data sets are similar to those presented in T2K’s previous analyses [1, 2]. The studies are updated due to the significant changes in the uncertainty model and ND analysis. The alternative models and tunes are selected to cover a number of interaction types and effects, listed next.

CC\(0\pi \) simulated data sets: The dominant CC\(0\pi \) samples at the ND and the single-ring samples at the FD are designed to select CCQE-like events. The larger statistics in these samples requires testing for a range of alternative models, and the robustness of the neutrino interaction model.

  • Non-CC-Quasi-Elastic (non-CCQE) contributions – Before the fit to data, the prediction of the CC\(0\pi \) selection at the ND is underestimated by \(0{-}20\%,\) depending on the outgoing lepton kinematics. Projecting the data and prediction onto the reconstructed four-momentum transfer, \(Q^2_{rec},\) defined as the \(Q^2\) calculated for a CCQE interaction on a stationary nucleon, and with a binding energy \(E_b,\)

    $$\begin{aligned} Q^2_{rec}&= 2E_{\nu }^{rec}\left( E_{\mu } - |{\vec {p}}_{\mu }|\cos \theta _{\mu }\right) - m^2_{\mu } \end{aligned}$$
    (3)
    $$\begin{aligned} E_{\nu }^{rec}&= \frac{1}{2} \frac{m^2_{\mu }+(m_{n}^{eff})^ 2-m^2_p -2E_{\mu } m_n^{eff}}{E_{\mu } - |\vec {p_{\mu }}|\cos \theta _{\mu } - m_n^{eff}} \nonumber \\ m_n^{eff}&=m_n-E_b \end{aligned}$$
    (4)

    the discrepancy is less than 5% at \(Q^2_{rec}<0.1~\text {GeV}^2\) and approximately 20% for higher \(Q^2_{rec}.\) The CCQE cross section is modified after the fit to ND data to account for the difference. This simulated data tests the hypothesis that the underestimation of data is actually due to non-CCQE contributions, and does so by scaling up their predictions instead of the CCQE components. The study is given in detail in Appendix B.

  • Alternative CCQE form factors – The baseline model used in this analysis assumes a dipole parametrisation of the nucleon form factor. There are other form factor models, of which the 3-component (an extension of Ref. [123]) and z-expansion [93] formalisms were tested. The effect is largely expected to be covered by the \(Q^2\)-related freedoms of the cross-section model.

  • Multi-nucleon (2p2h) model – The Nieves et al. model [60] was used to describe 2p2h interactions in this analysis. An alternative 2p2h model from Martini et al.  [96] was tested in the simulated data studies, because its 2p2h cross section is larger and evolves differently in \(E_{\nu }\) for neutrinos and anti-neutrinos, shown earlier in Fig. 9. Modelling the 2p2h spectrum is important in the ND to FD extrapolation, as it is one of the main sources of bias in the reconstructed neutrino energy spectrum of CCQE-like samples. The SuSAv2 model [97, 98], also shown earlier in Fig. 9, is a less extreme variation compared to the Martini model, so was not included.

  • Removal energy – The nuclear removal energy in the relativistic Fermi gas (RFG) model [124] was the largest contributor to uncertainty in the previous T2K analysis [1, 2]. This analysis’ spectral function (SF) model [43], mentioned earlier in Sect. 5.1, introduced an improved parametrisation for the removal energy uncertainty, and simulated data sets were developed to study its impact.

CC1\(\pi \) simulated data sets: Single-pion events are a background for the single-ring selections at the FD and contribute to the bias in reconstructed neutrino energy. Additionally, the 1R\(e\)1d\(e\) sample at the FD specifically targets single-pion events, which motivates the need to have a robust uncertainty model of these interactions. Three simulated data sets were produced:

  • ND data-driven pion momentum modification – The 1R\(e\)1d\(e\) selection at the FD tags low momentum pions below Cherenkov threshold by the presence of a delayed Michel electron. The ND analysis in Sect. 6 uses selections based on muon kinematics and pion tagging to constrain the uncertainties, and does not study the pion kinematics directly. As such, single-pion events may be modelled well in muon kinematics and poorly in pion kinematics. A data-driven simulated data set was created by studying the CC\(1\pi \) selections at the ND, using the model that was fit to ND data in lepton kinematics. The model was used to predict the reconstructed pion momentum spectrum, \(p_{\pi }^{reco},\) in the single-pion ND selections, which was compared to the data in the \(p_{\pi }^{reco}<200~\text {MeV}/c\) region. The number of events was underestimated by \(\sim 20\%,\) which was applied as an overall normalisation to the simulation of all single-pion events at the FD that had a pion with generated (true) momentum below \(200~\text {MeV}/c.\) This is the only simulated data set that was not applied at the ND, and tested only at the FD.

  • MINERvA pion suppression – A low-\(Q^2\) suppression of the single-pion production cross section in GENIE [114] was needed to consistently describe neutrino interactions on plastic scintillator (CH) from MINERvA and bubble chamber data on nucleons [125]. The function parametrising this discrepancy was used to create simulated data at both the ND and the FD and the study is presented in detail in Appendix B.

  • Pion secondary interactions – This analysis introduced a new model for pions rescattering in the ND. The GEANT4 model [49] was replaced with NEUT’s Salcedo–Oset model [75, 76] which was tuned to \(\pi -A\) scattering data [44]. A hybrid simulated data set which blended features of the two models was used in the ND analysis to study the impact of choosing one model over the other.

A summary of the simulated data studies is presented in Sect. 9 after the analysis sections, and the simulated data studies are detailed in Appendix B.

6 Near-detector analysis

The high statistics data at the ND are used to constrain many of the neutrino flux and neutrino-nucleus interaction models present in the neutrino oscillation analysis. Sampling the unoscillated neutrinos at a high rate and tuning the prediction to the ND data allows for significant reduction of the uncertainties of the FD prediction. The ND analysis targets CC\(0\pi \) events as these are the signal at the FD, and additionally constrains the background contributions such as CC\(1\pi \) and CC multi-\(\pi \) events. Separation of \(\nu _{\mu }\) and \(\overline{\nu } _{\mu }\) events is possible due to the magnetised sign-selecting ND, and there are \(\nu _{\mu }\) selections in the \(\overline{\nu }\)-mode which constrain the wrong-sign background.

As in previous T2K oscillation analyses [1, 2], two complementary likelihood sampling methods are used and are cross-validated. One is based on Markov Chain Monte Carlo (MCMC) methods [126, 127], and the other is based on minimising a test-statistic through gradient-descent methods in Minuit [128]. The MCMC analysis is inherently Bayesian, and has the ability to run a simultaneous ND+FD analysis whose results are presented in Sect. 8.1. The gradient-descent analysis instead fits the systematic uncertainties in the simulation to find the global minimum of the test statistic that best describes the data at the ND, discussed in Sect. 6.3. The central value and covariance matrix of the systematic uncertainties around that best-fit point is then propagated to the FD. The MCMC framework directly implements the removal energy shift parameters described in Sect. 5.2 which allows for discrete event migrations between bins, whereas the gradient-descent framework smooths the effect by an effective binned treatment to avoid discontinuous likelihoods. The MCMC analysis also implements a non-uniform rectangular binning, meaning the binning in the x variable is not uniform in the y variable, allowing the events to be binned finer and more effectively, generally leading to improved sensitivity to the systematic uncertainties. The gradient-descent framework instead uses a uniform rectangular binning. The effect of these differences is tested at the FD by propagating the results from the gradient-descent framework, which assumes correlated Gaussian parameter constraints, and comparing to propagating the constraints from the steps in the MCMC, which is detailed in Sect. 8.3. This section shows the results from the gradient-descent based analysis.

This analysis of ND data uses \(19.867\times 10^{20}\) POT, with \(11.531\times 10^{20}\) collected in \(\nu \)-mode and \(8.336\times 10^{20}\) collected in \(\overline{\nu }\)-mode, as listed earlier in Table 1. This is an overall POT increase of 106% compared to the previous analysis.

6.1 ND selections

The doubling of \(\overline{\nu }\)-mode data in the ND allowed for a refining of the anti-neutrino selections. Additionally, the \(\overline{\nu }\)-mode beam samples now match the \(\nu \)-mode beam samples in the separation of events by their reconstructed pion multiplicity. Previous analyses only split the \(\overline{\nu }\)-mode selections into events with a single muon-like track (CC-1Track) and events with a single muon-like track with at least one charged or neutral pion candidate (CC-NTrack).

The events are categorised into 18 samples, split into nine equivalent FGD1 and FGD2 samples to separate neutrino interactions on plastic scintillator (FGD1), and plastic scintillator and water (FGD2). The samples first require a reconstructed muon to be present. They are then split by the sign of the muon candidate – which implies the identity of the incoming neutrino – classifying events as \(\nu _{\mu }\) events in \(\nu \)-mode, \(\overline{\nu } _{\mu }\) events in \(\overline{\nu }\)-mode, and \(\nu _{\mu }\) events in \(\overline{\nu }\)-mode. Each of these charged-current inclusive selections are separated into three reconstructed topologies based on the number of reconstructed charged pions. An event with no reconstructed pions is classified as CC\(0\pi \); an event with a single charged pion with opposite charge to the muon is CC\(1\pi \); and an event with any other number of charged pions (e.g. \(1\mu ^{-}2\pi ^+\) or \(1\mu ^{-}1\pi ^-\) in \(\nu \)-mode), or at least one neutral pion, is classified as CC other. There is no requirement on the number of proton tracks and there is no dedicated \(\nu _{e}\) or \(\overline{\nu } _{e}\) selection.

The pion tagging in the \(\nu _{\mu }\) selections is the same as in previous T2K analyses [1, 2]. A pion is tagged by either a pion-like track in the TPC, a pion-like track contained in the FGD, or an isolated delayed Michel electron in an FGD. In the FGD and TPC tagging, the pion candidate is required to share its vertex with the muon candidate, and for the Michel tag it is required to be in the same FGD as the candidate vertex. For the anti-neutrino selections, TPC and FGD pion-like tracks are identified similarly to the neutrino selections, whilst the Michel tag can only identify positively charged pions since negatively charged pions are more likely to be absorbed. For \(\nu \)-mode selections, Michel-tagged pions dominate for \(p_\pi <175~\text {MeV}/c,\) TPC-tagged pions dominate when \(p_\pi >250~\text {MeV}/c,\) and the FGD-contained pions make up 30% of all pion tags when \(100~\text {MeV}/c<p_\pi <250~\text {MeV}/c.\) There are virtually no Michel-tagged or FGD-contained pions when \(p_\pi >400~\text {MeV}/c.\) Combining the tags, the selection has about \(25\%\) charged pion tagging efficiency when \(p_\pi < 300 ~\text {MeV}/c,\) increasing roughly linearly to \(\sim 50\%\) at \(p_\pi =1~\text {GeV}/c.\) Neutral pions are tagged by identifying a displaced \(e^{\pm }\) candidate in the TPC, indicating the presence of a photon conversion.

Table 4 Efficiencies and purities for each of the selections at the ND in this analysis, including wrong-sign background components. The efficiency is defined as the number of events that have a reconstructed selection that matches the true selection, divided by the total number of events with that same true selection. The purity is defined as the number of events with the desired selection divided by the total number of events in the selection

The efficiencies and purities are determined from reconstructed simulated events, and are provided in Table 4, which shows similar performance for the two FGDs. FGD2 has worse Michel tagging and FGD-contained track reconstruction than FGD1 due to the passive water layers, resulting in a lower efficiency for CC\(1\pi \) selections. The purity for CC\(0\pi \) selections for \(\nu \)-mode and \(\overline{\nu }\)-mode is above 70%, and \(\sim 55\%\) for the \(\nu _{\mu }\) in \(\overline{\nu }\)-mode due to the wrong-sign neutrino flux having a longer tail, which makes multi-particle final states more likely. The \(\overline{\nu } _{\mu }\) CC\(0\pi \) efficiency is higher than \(\nu _{\mu }\) CC\(0\pi \) due to \(\overline{\nu } _{\mu }\) CCQE interactions usually producing a neutron in lieu of the proton from \(\nu _{\mu }\) CCQE interactions. In \(\nu _{\mu }\) CCQE interactions, the outgoing proton may produce a clear track in the detector, which has a probability of being mis-tagged for a \(\pi ^+\) (or \(\mu ^+\) for \(\overline{\nu } _{\mu }\) selections), and so enters another selection; this is very unlikely when the outgoing particle is a neutron. Furthermore, \(\overline{\nu } _{\mu }\) interactions generally produce a larger proportion of forward-going events, where the ND has better acceptance.

The \(\overline{\nu } _{\mu }\) CC other selections’ low purities compared to the \(\nu _{\mu }\) in \(\nu \)-mode and \(\nu _{\mu }\) in \(\overline{\nu }\)-mode equivalents stem from the larger wrong-sign background that, for the reasons stated earlier, produces multiple pions which may be wrongly selected as the \(\mu ^+\) candidate. In addition, the muon candidate in \(\overline{\nu }\)-mode can be incorrectly assigned as a high momentum proton around \(p\sim 1~\text {GeV}/c,\) where the energy loss in the TPC for a proton is similar to that of a muon. This track confusion seldom happens in the \(\nu _{\mu }\) selections, since it selects a negatively charged track. A \(\pi ^-\) is rarely selected as the \(\mu ^-\) in \(\nu _{\mu }\) selections since it requires a higher energy multi-\(\pi \) event or final-state interactions of a hadron from the primary interaction.

Generally, the mis-identification of the muon candidate is largest at low momentum, when it does not leave a long enough track to reliably assess the degree of bending in the ND’s magnetic field. Almost all wrong-sign muons, pions and electrons selected as the muon candidate occupy this region. In the case of mis-identification, the muon candidate is otherwise a pion with same charge due to their similar energy loss in the FGDs and TPCs. Using the combined FGD+TPC detector system, there is a 94%, 86%, and 77% probability that the muon candidate is a muon in the CC\(0\pi \), CC\(1\pi \) and CC other selections, respectively.

6.2 ND related uncertainties

Dedicated control samples have been developed to evaluate the response of the ND and to quantify systematic uncertainties [129]. These uncertainties include the modelling of pion and proton secondary interactions in the detector, particle mis-identification probabilities in the TPCs and FGDs, magnetic field distortions, momentum resolutions and scales, efficiencies related to clustering, tracking and track matching, Michel-tagging efficiencies, pile-up, FGD mass, out of fiducial volume (OOFV) background events, and sand muon backgrounds. Sand muon backgrounds enter the selections when neutrinos from a beam spill interact in the sand surrounding the ND pit, creating a muon that enters the ND. These uncertainties can migrate events into or out of selections and change the reconstructed particles’ kinematics. The uncertainties can either be efficiency-like (dependent on a particle’s kinematics) or normalisation-like (independent of a particle’s kinematics).

This analysis is the first to use NEUT’s semi-classical Salcedo–Oset cascade model [75, 76], mentioned in Sect. 5.1.5, for pion secondary interactions in the detector, where previous analyses used GEANT4 [49]. The model was tuned to external \(\pi -A\) scattering data [44], and was found to agree better with data and be more consistent across the interaction channels and pion energy ranges compared to GEANT4. Additionally, T2K now uses the same model for pion final-state and secondary interactions in both the ND and the FD. The ND constraint on pion final-state interactions is propagated to the FD, whereas the constraint on the secondary interactions is not.

Table 5 Uncertainties on the total number of events in the ND analysis from detector uncertainties only, broken down by selection
Table 6 Number of events in each of the ND selections for data and the ratio to the prediction before and after the fit to data

The uncertainties from the detector uncertainties are presented in Table 5, and are \(1.2{-}2.1\%\) for the CC\(0\pi \) selections, and \(2.5{-}4.0\%\) for the CC\(1\pi \) and CC other selections. The secondary interaction uncertainty for pions contribute \(70{-}95\%\) of the total detector-related uncertainties, depending on the selection. For reference, the statistical uncertainty on the number of events in the ND selections, presented later in Table 6, is \(0.5{-}1.3\%\) for the \(\nu \)-mode selections, and \(1.1{-}3.9\%\) for the \(\overline{\nu }\)-mode selections.

6.3 Defining the likelihood

Each selection is binned in the reconstructed muon momentum, \(p_{\mu }\), and the cosine of the muon angle with respect to the detector z-axis, \(\cos {\theta _{\mu }},\) which nearly lines up with the average neutrino direction.Footnote 1 The ND likelihood is constructed by calculating the \(-2\ln {\mathscr {L}}_{\text {total}}\) of the data and simulation (MC) across all bins in all samples at each set of the parameter values. The systematic uncertainties in the models for the ND response, neutrino interactions, and neutrino flux, detailed in previous sections, are encoded via a Gaussian penalty term, which includes the covariances between the systematic uncertainties, shown in Eq. 7. The treatment of statistical uncertainties in the simulation has been updated [130, 131] and was validated against a complementary approach [132] and the previously used method. The total likelihood is defined as

$$\begin{aligned} {\mathscr {L}}_{\text {total}} = {\mathscr {L}}_{\text {stat}} \times {\mathscr {L}}_{\text {MC stat}} \times {\mathscr {L}}_{\text {syst}} \end{aligned}$$
(5)

where \({\mathscr {L}}_{\text {stat}}\) is the statistical likelihood, \({\mathscr {L}}_{\text {MC stat}}\) is the MC statistical uncertainty likelihood, and \({\mathscr {L}}_{\text {syst}}\) is the likelihood of the systematic uncertainties. The frequentist analysis maximises \({\mathscr {L}}_{\text {total}}\) by finding the minimum of \(-2\ln {\mathscr {L}}_{\text {total}},\) and the Bayesian analysis samples the \(-2\ln {\mathscr {L}}_{\text {total}}\) around the minimum in proportion to the posterior probability. The first two terms in Eq. 5 are linked, as the statistical uncertainty on the MC affects the number of MC events. The two statistical contributions read,

Fig. 10
figure 10

Constraints on the \(\nu \)-mode \(\nu _{\mu }\) flux uncertainty parameters at the FD from the fit to ND data (black points, black lines), overlaid on the input uncertainty (red band)

$$\begin{aligned}{} & {} -2\ln {\mathscr {L}}_{\text {stat}} - 2\ln {\mathscr {L}}_{\text {MC stat}}\nonumber \\{} & {} \quad =\,2 \sum ^{\text {samples}}_{i} \sum ^{\text {bins}}_{j} \Bigg [ \bigg ( N_{\text {MC}}{-}N_{\text {Data}} \Bigg . \bigg .\Bigg . \bigg . {+}N_{\text {Data}}\ln {\frac{N_{\text {Data}}}{N_{\text {MC}}}} \bigg ){+}\frac{\left( \beta _j{-}1\right) ^{2}}{2\sigma ^{2}_{\beta _j}} \Bigg ]\nonumber \\ \end{aligned}$$
(6)

where in each bin j of sample i\(N_{\text {Data}}\) \((N_{\text {MC}})\) is the number of events in data (MC), \(\beta _j\) scales the unweighted MC events, and \(\sigma _{\beta _j}\) is a measure of the MC statistical uncertainty. The systematic uncertainties are parametrised as correlated Gaussian penalties,

$$\begin{aligned} -2\ln {\mathscr {L}}_{\text {syst}} = \left( {\vec {x}}-{\vec {\mu }}\right) ^T {\textbf{V}}^{-1} \left( {\vec {x}}-{\vec {\mu }}\right) \end{aligned}$$
(7)

where \({\vec {x}}\) \(({\vec {\mu }})\) are the values of the systematic uncertainty parameters during (before) the fit, and \({\textbf{V}}\) is their covariance matrix. The ND constrains the flux uncertainty at the FD through such a covariance matrix. The low-momentum \(\overline{\nu } _{\mu }\) SPP, neutrino energy-dependent 2p2h, NC other, NC\(1\gamma ,\) and parameters are barely constrained by the ND analysis, so their constraints are not propagated to the FD in the frequentist analysis. In the simultaneous ND+FD Bayesian analysis, both detectors are used to constrain these parameters.

6.4 Results of the ND analysis

The ND analysis sees large shape changes in the \(\nu \)-mode \(\nu _{\mu }\) flux parameters with roughly 10% enhancement at low \(E_{\nu }\) and 10% suppression at high \(E_{\nu },\) as shown in Fig. 10. The neutrino flux parameters have strong correlations with each other and with some cross-section parameters, such as \(M_A^{QE}\) and the \(Q^2\) parameters, shown in Fig. 13. Moving the flux parameters by this amount incurs a penalty of \(-2\ln {\mathscr {L}}_{\text {flux}}/N_{\text {dof}}\sim 1\) for this variation in flux parameters due to the large correlations, confirmed by p-value studies in Sect. 6.6.

Fig. 11
figure 11

Constraints on the CC\(0\pi \) parameters, excluding the CCQE \(Q^2\) parameters, from the fit to ND data (black points, black lines), overlaid on the input uncertainty (red band). The parameters on the left-hand side of the figure are presented as a ratio to the generated value in NEUT, and the right side shows the removal energy parameters, \(E_{rmv},\) with shifts in units of MeV. CCQE interactions are generated in NEUT with \(M_A^{\text {QE}}=1.21~\text {GeV},\) but a pre-fit value of \(1.03~\text {GeV}\) was used after analysis of CCQE bubble chamber data. The absence of an uncertainty band reflects that the parameter was not constrained by external inputs before the analysis of ND data

Fig. 12
figure 12

Constraints on the CCQE \(Q^2\) parameters as a function of \(Q^2\) from the fit to ND data (black points, black lines), overlaid on the input uncertainty (red band). The absence of an uncertainty band reflects that the parameter was not constrained by external inputs before the analysis of ND data

Figures 11 and 12 show the CC\(0\pi \) cross-section parameters after the fit. Despite the external constraint on \(M_A^{QE},\) the data prefers a larger value of \(M_A^{QE}=1.16~\text {GeV}.\) A complementary fit, changing the uncertainty on \(M_A^{QE}\) to be unconstrained instead of informed by bubble chamber data, had little impact on the ND analysis and the predictions at the FD; hence the constraint on \(M_A^{QE}\) is primarily driven by the ND data. The 2p2h normalisation is different for neutrinos and anti-neutrinos, which are both constrained to \(\sim 15\%\) uncertainty, with the 2p2h normalisation for neutrinos consistent with the prediction from Nieves et al. The 2p2h normalisation for carbon and oxygen is consistent with 1, although the shape parameter for oxygen agrees with the Nieves model, whereas the carbon parameter is pulled to be more \(\varDelta \)-like, differing by \(\sim 1\sigma .\) The removal energy parameters are within their uncertainties before the fit, and are compatible for the carbon, oxygen, neutrino and anti-neutrino parameters.

The CCQE \(Q^2\) parameters are shown in Fig. 12, where there is a suppression at low \(Q^2\) until \(0.2~\text {GeV}^2,\) consistent with other cross-section data mentioned in Sect. 5.2. At higher \(Q^2\) the data prefers an enhancement of \(20{-}30\%.\) The \(Q^2\) parameters have strong anti-correlations with the flux parameters, as shown in Fig. 13, and studies with fixed values of the \(Q^2\) parameters showed that the flux parameters compensate for differences in \(Q^2\) for CCQE events, a testament to the parameters’ correlations.

Fig. 13
figure 13

Correlations between selected \(\nu \)-mode \(\nu _{\mu }\) FD flux and CCQE cross-section parameters. The flux and \(Q^2\) normalisation parameters’ ranges are in units of GeV. The strong anti-correlations between the flux and cross-section parameters significantly reduce the uncertainties on the predictions at the FD

The 2p2h normalisation has been given the freedom to independently vary for neutrino and anti-neutrinos, and differences in 2p2h neutrino and anti-neutrino parameters may reflect a more general mismodelling of CC\(0\pi \) interactions. This may allow deficiencies in the anti-neutrino CCQE model to be absorbed in the 2p2h normalisation parameters. Similarly, \(M_A^{QE}\) and the CCQE \(Q^2\) normalisation parameters may be absorbing effects from a different axial form factor parametrisation, which may evolve differently as a function of other variables, e.g. \(E_{\nu },\) as mentioned in Sect. 5.3. Both of these effects, amongst others, are studied through simulated data studies in Sect. 9 and Appendix B. The full parameter set with their values before and after the analysis of ND data is provided in Appendix E.

The MCMC and gradient-descent analyses differed in the treatment of the removal energy uncertainty. The MCMC allows for discrete movement of events between bins, which may produce multi-modal posterior probability distributions (output constraint) of the removal energy parameters. The smoothed binned implementation in the gradient-descent framework prevents this from disrupting the ability to find the maximum likelihood, whilst still capturing the overall physics behaviour of the removal energy uncertainty. The impact of this and other differences between the analyses, such as the non-uniform rectangular binning scheme, were addressed by separately propagating the covariance matrix from the gradient-descent framework and the parameter variations sampled by the MCMC to the oscillation analysis in Sect. 8.

In general, the constraints on the parameters and the impact of the ND analysis agrees with the expected sensitivity. Furthermore, compatible results are found between the MCMC and the gradient-descent analyses in the central value estimates, uncertainties, and correlations of the parameters, leading to consistent sample predictions at the ND and the FD.

6.5 ND predictions

The aforementioned selections in the data and simulation are compared before and after fitting to data, using the constraints on the systematic uncertainties from Sect. 6.4. Table 6 shows the number of events in each selection, where the agreement between the post-fit prediction and the data is notably improved compared to that of the pre-fit prediction, especially for the CC\(0\pi \) events, which comprise the main signal at the FD. There is a consistent rise across all CC\(0\pi \) selections and a small suppression of \(\nu \)-mode \(1\pi ^+\) events, improving agreement with the data. This causes the smaller \(\overline{\nu }\)-mode \(1\pi ^-\) prediction to also be suppressed, since they share parameters in the interaction model, with the neutrino flux and detector uncertainties being more loosely correlated, connected only through their input covariance matrices.

Fig. 14
figure 14

Comparison of predicted pre-fit (top) and post-fit (bottom) event distributions for the ND FGD1 \(\nu \)-mode \(\nu _{\mu }\) CC\(0\pi \) sample (left) and FGD2 \(\overline{\nu }\)-mode \(\overline{\nu } _{\mu }\) CC\(0\pi \) sample (right). The data and prediction are shown in the reconstructed momentum of the muon candidate, and the simulation is broken down by interaction channel. The bottom insets show the ratio of data to simulation

The observed and predicted \(\nu \)-mode \(\nu _{\mu }\) FGD1 CC\(0\pi \) events projected onto \(p_{\mu }\) are shown in Fig. 14 before and after the fit to data. Before the fit, there is a notable under-prediction which is largest at low \(p_{\mu }\). The fit increases the CCQE and 2p2h components and decreases the \(1\pi \) components in the prediction to agree with the data. For comparison, the \(\overline{\nu }\)-mode \(\overline{\nu } _{\mu }\) FGD2 CC\(0\pi \) selection is also shown in Fig. 14, where there is agreement between the prediction and the data before the fit, which marginally improves after the fit. This showcases the ability of the systematic uncertainty treatment in the analysis to modify and constrain the modelling of neutrino and anti-neutrino interactions on carbon and oxygen separately, and the strength of having a sign-selecting ND.

6.6 Assessing model compatibility with data

A p-value is calculated to assess the probability of the model given the data, and represents the probability that a model with a test statistic equal to or larger than the observed data is found. Simulated data sets, referred to as “toys”, are created by varying the systematic uncertainties in the model according to their input covariances before the ND analysis, and statistical fluctuations are applied. The model is fit to each toy and the \((-2\ln {\mathscr {L}})_{\textrm{min}}\) is calculated. The p-value is defined as the fraction of the simulated data sets with \((-2\ln {\mathscr {L}})_{\textrm{min}}^{\textrm{Toy}} \ge (-2\ln {\mathscr {L}})_{\textrm{min}}^{\textrm{Data}}.\) An a priori criteria of \(p>0.05\) is required of the ND analysis for the results to be used in the oscillation analysis. Using a total of 895 simulated data sets, \(p=0.74,\) demonstrating good agreement between the model and the data.

Breaking down the \((-2\ln {\mathscr {L}})_{\textrm{min}}\) contributions by the likelihoods from the selected samples and systematic uncertainties in Table 7, the selected samples are generally described well with \(p=0.82,\) with individual p-values for the CC\(0\pi \) selections between \(p=0.15{-}0.93.\) Splitting the neutrino flux contributions into \(\nu \)-mode \(\nu _{\mu }\), \(\nu \)-mode \(\overline{\nu } _{\mu }\), \(\overline{\nu }\)-mode \(\nu _{\mu }\) and \(\overline{\nu }\)-mode \(\overline{\nu } _{\mu }\), \(p=0.74,0.74,0.31,0.37\) respectively, showing good compatibility. The cross-section systematics are the worst contributor with \(p=0.01,\) coming predominantly from parameters that are pulled away from their external constraints, e.g. \(M_A^{QE},\) \(M_A^{RES}\) and \(C_5^A.\) When instead varying the systematic uncertainty parameters with respect to their constraints after fitting to data, the cross-section model p-value improves to approximately \(p=0.3.\) This indicates that the cross-section model before the fit to data is unfavourable, but after the fit to data is satisfactory. The near-detector analysis constrains the product of the neutrino flux, ND detector, and neutrino interaction uncertainties, leading to large correlations between the systematic uncertainties, as demonstrated in Fig. 13. Therefore, studying one group’s p-value in isolation from the other is not exact. For this reason, the p-values from the uncertainty parameters do not have to follow the same strict criteria of \(p>0.05.\) However, the low p-value does highlight the need for continued effort in developing realistic neutrino interaction models and associated uncertainties.

7 Far-detector selection

Fig. 15
figure 15

Reconstruction performance at the FD of stopping cosmic-ray muons and the Michel electrons from their decays. The left panel shows the reconstructed momentum distribution of those electrons for data taken during the SK-IV (blue) and SK-V (red) detector periods. The right is a similar comparison showing the parent muon’s particle ID parameter, which separates events into electron-like (positive values) and muon-like (negative values) categories. The uncertainty on the data points is statistical

The FD event selection in this analysis is the same as used in previous T2K results [1]; only the data have been updated, and the selection is briefly reviewed here. Similarly, the method of evaluating systematic uncertainties related to the FD is unchanged from previous analysis, where atmospheric events in SK are used to calculate the uncertainties using a MCMC-based approach.

Table 7 p-values comparing the variations of the model before the ND analysis and the model fit to the data, broken down by likelihood contributors, and showing the p-value for all samples, and the total p-value including all samples and all systematic uncertainties

The event reconstruction in SK uses both charge and timing information from hits in the PMTs, and particles are detected using their Cherenkov rings. The vertex position, momentum, and particle type of each ring is reconstructed [133]. Muons and electrons are differentiated by their ring profiles, where muons generally produce “sharper” rings due to less scattering, and electrons produces “fuzzier” rings due to their electromagnetic showers. All samples in this analysis are based on observing one electron-like (1R\(e\)) or muon-like (1R\(\mu \)) primary Cherenkov ring, and a specific number of delayed triggers relative to the primary interaction, consistent with a Michel electron from an unseen charged pion’s decay chain (referred to as decay electron, or “d\(e\)”). Three samples are selected in the \(\nu \)-mode data: a CCQE-like \(\nu _{e}\) sample (\(\nu \)-mode 1R\(e\) with 0 d\(e\)), a CCQE-like \(\nu _{\mu }\) sample (\(\nu \)-mode 1R\(\mu \) with 0 or 1 d\(e\)), and a CC single pion-like \(\nu _{e}\) sample (\(\nu \)-mode 1R\(e\) with 1 d\(e\)). Similarly, there are two single-ring \(\overline{\nu }\)-mode data samples: a CCQE-like \(\overline{\nu } _{e}\) sample (\(\overline{\nu }\)-mode 1R\(e\) with 0 d\(e\)) and a CCQE-like \(\overline{\nu } _{\mu }\) sample (\(\overline{\nu }\)-mode 1R\(\mu \) with 0 or 1 d\(e\)). Unlike the ND, the FD is not magnetised and can therefore not determine the charge of the outgoing particles.

Since the start of T2K operations in 2009, the gain of the SK inner detector’s PMTs has increased at a rate of at most a few percent per year. In previous T2K analyses, this effect was corrected during the reconstruction stage using a run-by-run global correction factor for all PMTs. However, the gain drift differs based on the PMT production year, and the current analysis adopts a more detailed correction that accounts for these differences. All T2K FD data in this analysis have been reprocessed and reconstructed using the updated correction. The change to the gain correction results in a change in the observed charge available to the reconstruction algorithm relative to previous analyses, even when processing the same event. This may cause small shifts in an event’s reconstructed parameters, including the number of rings, and each ring’s particle type and momenta, which has caused some events to migrate into or out of the oscillation analysis samples with respect to the previous analyses. For the reprocessed run \(1{-}9\) data there are in total 1 more \(\nu \)-mode 1R\(e\), 1 fewer \(\nu \)-mode 1R\(e\)1d\(e\), 1 more \(\overline{\nu }\)-mode 1R\(e\), and 3 fewer \(\overline{\nu }\)-mode 1R\(\mu \) events compared to previous oscillation analysis. The migration of the events is summarised in Table 8. As the gain correction is applied to data and not to the simulation, the event migration has been cross-checked in both atmospheric neutrino and cosmic-ray muon data samples, which are used to evaluate FD detector uncertainties in the T2K analysis. In both studies, the level of migration was found to be consistent with that observed in the T2K beam data.

Table 8 Summary of event migrations at the FD after reprocessing data from the previous T2K analysis [1, 2]. “Inward” refers to newly added events that were not present in the previous analysis, “outward” refers to events that were lost to the update, and “overlap” refers to the number of events that are common to the two analyses
Fig. 16
figure 16

Event timing at the FD for fully contained events collected during runs 1–9 and run 10, overlaid with the central value of the expectation from the beam bunch timing structure

Table 9 Predictions for the number of events at the FD using oscillation parameters and systematic uncertainty parameters at their best-fit values whilst varying \(\delta _{\scriptscriptstyle \textrm{CP}}\)
Fig. 17
figure 17

The events in the full data set for the five FD samples, shown in reconstructed lepton momentum and the angle between the neutrino beam and the lepton in the lab frame. The coloured background in the two-dimensional plot shows the expected number of events from the frequentist analysis, using the best-fit values for the oscillation and systematic uncertainty parameters, applying the reactor constraint on \(\sin ^2\theta _{13}\). The insets show the events projected onto each single dimension, and the red line is the expected number of events from the best-fit. The uncertainty represents the \(1\sigma \) statistical uncertainty on the data

Fig. 18
figure 18

The number of \(\nu \)-mode 1R\(e\) + 1R\(e\)1d\(e\) versus \(\overline{\nu }\)-mode 1R\(e\) events (top, leading \(\sin \delta _{\scriptscriptstyle \textrm{CP}} \) dependence) and \(\nu \)-mode 1R\(e\) + 1R\(e\)1d\(e\) + \(\overline{\nu }\)-mode 1R\(e\) events above and below \(E_{rec}=550~\text {MeV}\) (bottom, leading \(\cos \delta _{\scriptscriptstyle \textrm{CP}} \) dependence), with the predicted number of events for various sets of oscillation parameters, as shown by the different coloured ellipses. The values for the neutrino mass splitting are from the frequentist analysis of data, where \(\varDelta {}m^2_{32} =2.40\times 10^{-3}~\text {eV}^2\) \((\varDelta {}m^2_{31} =-2.46\times 10^{-3}~\text {eV}^2)\) is the best-fit point in the normal (inverted) ordering. The uncertainties represent the 68% confidence interval for the mean of a Poisson distribution given the observed data point. The underlaid contours contain the predicted number of events for 68% of simulated experiments, varying the systematic uncertainty parameters around the best-fit values from the fit to ND data, and oscillation parameters set to the best-fit values from a fit to data. The overlaid triangle point shows the predicted number of events with both oscillation and systematic uncertainty parameters at their data best-fit values

This analysis is the first to include data following the refurbishment of the FD in 2018, after the detector had been prepared for the gadolinium phase [39] but still using the ultrapure water without gadolinium, referred to as the SK-V period. Following this work, T2K’s run 10 was under slightly different detector conditions than that of the previous data sets. This period had a larger background rate primarily at \({\mathscr {O}}(\text {MeV})\) energies, irrelevant to T2K’s analysis. During the run, the water’s attenuation length, as measured by through-going cosmic-ray muons, was found to be stable above 90 m, consistent with data taken before the refurbishment, albeit slightly longer. This suggests event reconstruction and detector uncertainties should similarly be consistent between the data periods, and several cross-checks were performed to confirm this.

Figure 15 shows such a comparison between stopping cosmic-ray muon data and their Michel electrons taken during the run 9 and run 10 data periods at SK. The similarity of the distributions over both data sets highlights the stability of the detector and reconstruction algorithm following the refurbishment in 2018. Though only the reconstructed Michel momentum distribution and the parent muon’s particle ID parameter are shown in the figure, distributions for other reconstructed parameters used in the T2K event selection showed similar high consistency. Kolmogorov–Smirnov tests of the expected events in run 10 confirmed this. This was true for other calibration data as well as for atmospheric neutrino data, and small differences in these distributions were within current uncertainties.

Good detector stability was also found for the timing and selection of events observed in the T2K beam. The distribution of event times relative to the start of the spill at J-PARC is shown in Fig. 16 for events with minimal outer detector activity, labelled fully-contained events. Events from run 10 showed a 34.2 ns RMS relative to their nearest expected bunch timing (dotted lines in the figure), consistent with that from previous runs.

Amongst the 354 selected fully-contained events in run 10, 75 were selected as 1R\(\mu \), 18 as 1R\(e\), and there were no new 1R\(e\)1d\(e\) events for the analysis described in the next section. The number of events in each selections is presented in Sect. 8, Table 9.

Table 10 Uncertainties on the number of events in each FD sample broken down by source after (before)the fit to ND data. “FD + SI + PN” combines the uncertainties from the FD detector, secondary particle interactions (SI), and photo-nuclear (PN) effects. “Flux\(\otimes \)Interaction” denotes the combined effect from the ND constrained flux and interaction parameters, and the unconstrained interaction parameters. The change in the “FD + SI + PN” uncertainties before and after the ND fit is an indirect effect due to the change of interaction mode fractions in the samples after the ND fit

8 Oscillation analysis

This section presents the three-flavour oscillation analysis from the full data set presented in Fig. 17, including the constraints from the ND analysis in Sect. 6. The analyses at the FD are first introduced, followed by the constraints on the oscillation parameters from the Bayesian and frequentist data analyses in Sects. 8.1 and 8.2, respectively. The comparison of the Bayesian and frequentist analyses are presented in Sect. 8.3, and the new result is put in the context of current world data in Sect. 8.4. The results presented in this section include the uncertainty inflation procedure from simulated data studies mentioned in Sect. 5.3, whose results are discussed in detail later in Sect. 9 and Appendix B.

The impact of \(\delta _{\scriptscriptstyle \textrm{CP}}\) on the number of events in the selections is shown in Table 9, where there is a relatively small sensitivity in the \(\overline{\nu }\)-mode 1R\(e\) selection, and most sensitivity comes from the \(\nu \)-mode 1R\(e\) selection, owing to the number of events in each sample. To summarise the results, the number of observed electron neutrino events are plotted against the observed anti-neutrino events in Fig. 18, where the data favours \(\delta _{\scriptscriptstyle \textrm{CP}} \sim -\pi /2,\) \(\varDelta {}m^2_{32} >0,\) and \(\sin ^2\theta _{23} >0.50;\) i.e. near maximal CP violation, the normal mass ordering, and the upper octant in the PMNS paradigm. The 1R\(e\) +1R\(e\)1d\(e\) events in \(\nu \)-mode and the 1R\(e\) events in \(\overline{\nu }\)-mode are sensitive to \(\sin \delta _{\scriptscriptstyle \textrm{CP}} \), the neutrino mass ordering, and the octant of \(\theta _{23},\) and their energy spectra has some sensitivity to \(\cos \delta _{\scriptscriptstyle \textrm{CP}} \), as illustrated in Fig. 18. Compared to T2K’s previous analysis [1, 2], the data are now closer to the best three-flavour fit prediction, resulting in a slightly weaker constraint on \(\delta _{\scriptscriptstyle \textrm{CP}}\). The weaker constraint is, however, more compatible with the expected sensitivity of the experiment, discussed later in Sect. 8.2.

Fig. 19
figure 19

Total uncertainty on the reconstructed neutrino energy spectrum in the FD selections before and after the ND analysis of data. The oscillation parameters are set to values near the T2K best-fit point, specified in Appendix B, Table 17

The systematic uncertainties on the predicted number of events before and after the fit to ND data is given in Table 10. After the fit, the total uncertainty is reduced by a factor 2–5 depending on the sample, with the impact from flux and interaction uncertainties reduced by more than 60%. After the ND fit, the interaction uncertainties are of similar size to the FD detector, pion secondary interaction, and photo-nuclear systematic uncertainties for all samples except the 1R\(e\)1d\(e\), which is dominated by FD detector uncertainties. The FD detector uncertainties characterise the performance of SK and its reconstruction, the pion secondary interaction uncertainties were discussed in Sect. 5.2.5 and are informed by external \(\pi -A\) scattering data, and the photo-nuclear uncertainty comes from when Cherenkov photons are absorbed by the nuclei in the FD, causing particles to be mis-reconstructed or entirely missed due to the lack of any Cherenkov rings. Although the impact from uncertainties in the flux and interaction model are similar for the selections at about 3% when considered separately, they significantly correlate with each other after the fit to ND data, which causes the combined uncertainty from the ND-constrained interaction parameters and the neutrino flux to be smaller than the sum of their squares.

These constraints are used to build the predictions for the FD energy spectra including all uncertainties, as shown in Fig. 19. The five lower-\(Q^2\) parameters have no external constraints, and the expected sensitivity from a FD-only fit (excluding the ND) is used as the uncertainty. This is solely for the purpose of providing a representative uncertainty on the events when an ND fit is not used, and this uncertainty is not used elsewhere in the analysis.

The degrees of freedom from the oscillation parameters are of the form \(\sin ^{2}\theta _{ij},\) \(\varDelta m^{2}_{ij},\) and \(\delta _{\scriptscriptstyle \textrm{CP}}\). T2K is not sensitive to the “solar” oscillation parameters \(\sin ^2\theta _{12}\) and \(\varDelta {}m^2_{21}\), therefore constraints from the world averages reported in PDG 2019 [11] are imposed,Footnote 2 where the frequentist analysis fixes the parameters and the Bayesian analysis accounts for their uncertainties. An additional constraint may be imposed on \(\sin ^2\theta _{13}\) from the world average reported in PDG 2019 [11], referred to as the “reactor constraint”.Footnote 3 The reactor constraint has a significant effect on the sensitivity to other oscillation parameters of interest, notably \(\delta _{\scriptscriptstyle \textrm{CP}}\). Accordingly, results are presented with and without this constraint applied. The reactor constraint is applied as a Gaussian penalty to the test statistic for both the frequentist and Gaussian analyses.

8.1 Bayesian results

Fig. 20
figure 20

Marginalised posterior probability densities from the Bayesian analysis for oscillation parameters of interest from a fit to data with the reactor constraint on \(\sin ^2\theta _{13}\) applied. The two-dimensional posteriors have 68% (dashed) and 90% (solid) credible levels indicated and the point with highest posterior probability. The one-dimensional posteriors have 68%, 90%, and 95% credible intervals indicated in different shades of grey. All credible regions are calculated from marginalising over both mass orderings, although panels displaying \(\varDelta {}m^2_{32}\) show only the portions of the distributions in the normal mass ordering \((\varDelta {}m^2_{32} >0)\)

The Bayesian results presented in this section are obtained by sampling the posterior distributions through MCMC [126, 127] analysis, using the ND and FD selections simultaneously. The MCMC analysis presented in Sect. 6 is utilised for the ND. The e-like samples use both the reconstructed angle between the outgoing lepton and the mean neutrino direction, and the reconstructed neutrino energy assuming a CCQE interaction and a struck nucleon at rest Eq. (4). For the 1R\(e\)1d\(e\) selection – which is dominated by \(1e^{-}1\pi ^+\) final states – the nucleon mass is replaced by the \(\varDelta (1232)\) mass. The \(\mu \)-like samples only use the reconstructed neutrino energy assuming a CCQE interaction. The posterior probability at the FD first includes the product of Poisson probabilities for observing the number of events in the data given the model prediction per bin across all samples. A Gaussian multivariate distribution is used to include the effect of external constraints on the systematic uncertainty parameters. The general form of the likelihood is the same as the ND analysis, presented in Eq. 6, but excludes the statistical uncertainty on the simulation for the FD.

Credible regions are extracted from lower dimensional marginalised posterior distributions for parameters of interest by adding up the highest probability density region until a certain fraction of the distribution is captured. Flat priors are used over the entire ranges of \(\sin ^2\theta _{23}\), \(\varDelta {}m^2_{32}\), \(\delta _{\scriptscriptstyle \textrm{CP}}\) (or \(\sin \delta _{\scriptscriptstyle \textrm{CP}} \)), and Gaussian priors are applied on \(\varDelta {}m^2_{21}\) and \(\sin ^2\theta _{12}\). For \(\sin ^2\theta _{13}\) either a flat or a Gaussian prior is applied via the aforementioned reactor constraint. The priors for normal and inverted orderings are the same, namely 50%.

Figure 20 shows several marginalised posterior distributions for oscillation parameters of interest. Two-dimensional distributions for every combination of the four oscillation parameters of interest are shown with the \(68\%\) and \(90\%\) credible intervals in dashed and solid lines, respectively. Each two-dimensional posterior distribution also shows the point of highest probability density. Marginalised one-dimensional posterior probability distributions are also given for each of the four oscillation parameters with \(68\%,\) \(90\%,\) and \(95\%\) credible intervals in different shades of grey.

Fig. 21
figure 21

68% and 90% credible intervals from the marginalised \(\sin ^2\theta _{23}-\varDelta {}m^2_{32} \) posterior distribution with (red) and without (blue) the reactor constraint applied. The top (bottom) shows the proportion of probability density in the normal (inverted) mass ordering

8.1.1 Atmospheric oscillation parameters

The effects of applying the reactor constraint on the \(\sin ^2\theta _{23}-\varDelta {}m^2_{32} \) contours is shown in Fig. 21. Applying the constraint increases the probability density in the upper octant and the normal neutrino mass ordering. The marginalised posterior probability distribution of \(\sin ^2\theta _{23}\) with and without the reactor constraint is shown in Fig. 22. The posterior probabilities are largely overlapping, with a preference for the upper octant when using the reactor constraint, and there is barely any octant preference without the reactor constraint.

The results for the atmospheric parameters are summarised in Table 11, showing the proportion of the posterior probability that lies in the different mass orderings and \(\theta _{23}\) octant, with and without the reactor constraint. A flat prior distribution on both \(\varDelta {}m^2_{32}\) and \(\sin ^2\theta _{23}\) is equivalent to comparing the likelihood that T2K’s data is described by the different choices of hypotheses. The analysis with (without) the reactor constraint sees a Bayes factor (BF) of 3.35 (1.43) for the upper over the lower \(\theta _{23}\) octant; 4.21 (1.83) for the normal over inverted mass ordering; and a combined factor of 1.58 (0.63) for upper \(\theta _{23}\) octant and normal ordering. When calculating the BFs, the alternate hypothesis is any other combination of octant and mass ordering. Interpreting the largest BFs with the Jeffreys’ scale, there is substantial evidence for the normal ordering when marginalising over the octant, and substantial evidence for the upper octant when marginalising over the mass ordering. In the more recent interpretation of BFs by Kass and Raftery [134], these both correspond to positive evidence. Importantly, the Jeffreys and Kass–Raftery definitions of “evidence” do not equate to the criteria often used in particle physics. For instance, a probability of 95.4% (“\(2\sigma \)”) is equivalent to a BF of 20.7, which is deemed as “decisive” on the Jeffreys’ scale, and as “strong” on the Kass–Raftery scale.

Fig. 22
figure 22

The marginalised posterior probability density of \(\sin ^2\theta _{23}\) with (red) and without (blue) the reactor constraint on \(\sin ^2\theta _{13}\) applied. The shaded areas show the 68% and 95% regions of highest posterior density, equivalent to the 1\(\sigma \) and 2\(\sigma \) credible intervals

8.1.2 The CP-violating phase \(\delta _{\scriptscriptstyle \textrm{CP}}\), and \(\sin ^2\theta _{13}\)

A comparison of \(\sin ^2\theta _{13}-\delta _{\scriptscriptstyle \textrm{CP}} \) contours with and without the reactor constraint is shown in Fig. 23. The regions are in good agreement, with a majority of the 1\(\sigma \) regions overlapping, comparable with the reactor constraint. A comparison of the \(\delta _{\scriptscriptstyle \textrm{CP}}\) posterior distributions is shown in Fig. 24, showing the impact of the reactor constraint on T2K’s \(\delta _{\scriptscriptstyle \textrm{CP}}\) result. The external constraint breaks the partially degenerate effects of \(\sin ^2\theta _{13}\) and \(\delta _{\scriptscriptstyle \textrm{CP}}\) on the \(\nu _{e}\) appearance, leading to the \(\nu \)-mode 1R\(e\) and 1R\(e\)1d\(e\) selections having a larger sensitivity to \(\delta _{\scriptscriptstyle \textrm{CP}}\).

8.1.3 The Jarlskog invariant

The sampled posterior probability density is in part a function of the PMNS mixing angles and \(\delta _{\scriptscriptstyle \textrm{CP}},\) which means the probability distribution for the Jarlskog invariant [22, 23],

$$\begin{aligned} J=\sin \theta _{13} \cos ^2\theta _{13} \sin \theta _{12} \cos \theta _{12} \sin \theta _{23} \cos \theta _{23} \sin \delta _{\scriptscriptstyle \textrm{CP}} \nonumber \\ \end{aligned}$$
(8)

can be extracted directly from the steps in the MCMC. The posterior distribution for J is presented in Fig. 25, which favours a near-maximal negative J. The prior probability distribution is largely flat in the range \(J=[-0.035,0.035],\) with the fall-off beyond that coming from external \(\theta _{12}\) and \(\theta _{13}\) constraints. The preference for \(\sin ^2\theta _{23}\) values near maximal mixing has the effect of picking out the more extreme values of J. When sampling the full posterior probability, which incorporates the \(\delta _{\scriptscriptstyle \textrm{CP}}\) constraint, a preference for negative values of J emerges. The blue curve in Fig. 25 is recreated in Fig. 26 showing the \(1\sigma ,\) \(2\sigma ,\) and \(3\sigma \) credible intervals. Two-dimensional credible regions for the Jarlskog invariant against both \(\sin ^2\theta _{23}\) and \(\delta _{\scriptscriptstyle \textrm{CP}}\) are included in Appendix C.

Although this analysis does not rule out CP-conserving values of \(\delta _{\scriptscriptstyle \textrm{CP}}\) at \(2\sigma ,\) it does rule out \(J=0\) at the \(2\sigma \) level and excludes the \(J>0.17\) region at \(>3\sigma \) with a flat prior probability in \(\delta _{\scriptscriptstyle \textrm{CP}}\). The dependence of J on the choice of a prior flat in \(\delta _{\scriptscriptstyle \textrm{CP}}\) or flat in \(\sin \delta _{\scriptscriptstyle \textrm{CP}} \) is shown in Fig. 26. The prior flat in \(\sin \delta _{\scriptscriptstyle \textrm{CP}} \) flattens out the Jarlskog distribution, which in turn slightly expands the 2\(\sigma \) credible interval to where \(J=0\) is just included. These conclusions agree with previous studies on the impact of the \(\delta _{\scriptscriptstyle \textrm{CP}}\) prior at T2K [1, 2].

Table 11 Fractions of posterior probability in different combinations of the mass ordering and \(\theta _{23}\) octant from fit to T2K data with (without) the reactor constraint on \(\sin ^2\theta _{13}\). NO (IO) refers to the normal (inverted) neutrino mass ordering
Fig. 23
figure 23

68% and 90% credible intervals from the marginalised \(\sin ^2\theta _{13}-\delta _{\scriptscriptstyle \textrm{CP}} \) posterior distribution with (red) and without (blue) the reactor constraint (green band) applied, marginalised over both mass orderings

Fig. 24
figure 24

The marginalised posterior probability density of \(\delta _{\scriptscriptstyle \textrm{CP}}\) with (red) and without (blue) the reactor constraint applied. The shaded areas show the 68% and 95% regions of highest posterior density, equivalent to the 1\(\sigma \) and 2\(\sigma \) credible intervals

Fig. 25
figure 25

Posterior probability distributions for the Jarlskog invariant using a prior distribution from the 2019 PDG reactor constraint on \(\theta _{13}\) [11] (red), prior from all parameters except sampling \(\theta _{23}\) from the T2K posterior (green), and the full T2K posterior (blue). All three posterior probabilities used a prior probability distribution flat in \(\delta _{\scriptscriptstyle \textrm{CP}}\)

Fig. 26
figure 26

Posterior probability distributions for the Jarlskog invariant taken from posterior distributions with priors that are either flat in \(\delta _{\scriptscriptstyle \textrm{CP}}\) (blue) or flat in \(\sin \delta _{\scriptscriptstyle \textrm{CP}} \) (orange). 1\(\sigma ,\) 2\(\sigma ,\) and 3\(\sigma \) credible intervals are shown as the region between the vertical black solid line and the specified vertical dashed lines

8.1.4 Goodness-of-fit analysis

Predictions for the five samples at the FD are formed in Fig. 27, using the posterior probability distributions for the systematic uncertainties and oscillation parameters from the fit to data. By eye, the predictions agree well with the data, which are plotted as orange data points with statistical uncertainties applied. To quantify the model agreement with the data, the posterior predictive p-values [135] are calculated. These p-values can be calculated using either the total number of events per sample (rate-based) or the events per bin of each sample (shape-based). It can also be split by sample, or calculated as a total p-value. When including all samples, the shape-based and rate-based approach give \(p=0.73\) and \(p=0.30\) respectively. The p-values from both shape- and rate-based calculations broken down by sample and in total are tabulated in Table 12. Good p-values are demonstrated for all cases.

Fig. 27
figure 27

The reconstructed neutrino energy distributions of each FD sample. Data with Poisson uncertainties are shown in orange and the distributions of the predictions are shown in the coloured background, with the mean of those distributions overlaid in red. The z-axis represents the number of MCMC samples that had a prediction in a specific bin, and its intensity is directly proportional to the probability. The predictions are built by sampling both the nuisance and oscillation parameters from the posterior probability distribution in the Bayesian analysis. \(\sin ^2\theta _{13}\) is constrained from T2K data alone with no reactor constraint applied

8.2 Frequentist results

As in previous T2K analyses, the frequentist results are obtained using the marginal likelihood \({\mathscr {L}}_{\textrm{marg}}(\theta )\) \(=\) \(\int {\textrm{d}}\eta \,p(\eta )\) \({\mathscr {L}}(\theta ,\eta )\) as the test statistic. Here, \({\mathscr {L}}(\theta ,\eta )\) is the binned Poisson likelihood for the parameter of interest, \(\theta ,\) and the nuisance parameters, \(\eta .\) The statistical treatment of nuisance parameters in the fit is thus identical to the Bayesian analysis and assumes a prior probability distribution \(p(\eta ).\) The numerical integration is performed by varying systematic uncertainties with a Gaussian covariance matrix from the ND analysis in Sect. 6 as a constraint, and varying the other oscillation parameters with a flat prior probability distribution on \(\sin ^2\theta _{23}\), \(\delta _{\scriptscriptstyle \textrm{CP}}\), \(\varDelta {}m^2_{32}\), and \(\sin ^2 2\theta _{13}\), or a Gaussian prior on \(\sin ^2 2\theta _{13}\). Confidence intervals and regions are constructed with two different methods. For critical parameters with known boundary effects, the Feldman–Cousins (FC) method [136] is utilised to calculate the coverage. This is performed for the result using the reactor constraint, on the one-dimensional confidence intervals in \(\delta _{\scriptscriptstyle \textrm{CP}}\) and \(\sin ^2\theta _{23}\), and their joint confidence region. For generating the ensemble of experiments for FC evaluation, the nuisance oscillation parameters are varied from the posterior distribution obtained by fitting a representative simulated data set, sometimes referred to as “Asimov data”. This simulated data set is generated at the global best-fit point using the reactor constraint. Since the FC method is computationally intensive, the remaining confidence regions are constructed using constant \(\varDelta \chi ^2(\theta ) = \chi ^2(\theta ) - \min _{\theta '} \chi ^2(\theta ')\) values via Wilks’s theorem [137], where \(\chi ^2 = -2\ln {\mathscr {L}}_{\textrm{marg}}.\) Whether \(\varDelta \chi ^2\) is computed with respect to the minimum over both mass orderings, or the minimum in each mass ordering separately, is indicated in each of the results from the frequentist analysis. The frequentist analysis bins the e-like FD samples in reconstructed lepton angle and reconstructed lepton momentum, and the \(\mu \)-like samples in reconstructed lepton angle and the reconstructed neutrino energy, defined in the same way as in the Bayesian analysis presented in Sect. 8.1. In previous analyses, the \(\mu \)-like samples were binned only in reconstructed neutrino energy, and adding the lepton angle information increases the \(1\sigma \) expected sensitivity to \(\varDelta {}m^2_{32}\) by \({\mathscr {O}}(1\%).\)

Table 12 Breakdown of posterior predictive p-values by sample, quoted separately using a shape or rate based calculation, demonstrating good compatibility between the model and the data
Table 13 Results for the oscillation parameters from the fit to data with and without the reactor constraint in the frequentist analysis, with the confidence intervals estimated using the constant \(\varDelta \chi ^2\) method

Global best-fit values are given in Table 13. As noted in the Bayesian section, the results with and without the reactor constraint are compatible, with the former resulting in stronger constraints on \(\delta _{\scriptscriptstyle \textrm{CP}}\) and \(\sin ^2\theta _{23}\). All the following results are from the fit to data using the reactor constraint.

8.2.1 The CP-violating phase \(\delta _{\scriptscriptstyle \textrm{CP}}\), and mass ordering

Figure 28 shows the \(\varDelta \chi ^2\) distributions for \(\delta _{\scriptscriptstyle \textrm{CP}}\) in both mass orderings with FC-adjusted confidence intervals, which are also summarised in Table 14. A large region of \(\sin \delta _{\scriptscriptstyle \textrm{CP}} > 0\) is excluded at \(>3\sigma \) confidence level (CL), whereas the CP-conserving values \(\delta _{\scriptscriptstyle \textrm{CP}} =0,\pi \) are excluded at 90% CL. In particular, \(\delta _{\scriptscriptstyle \textrm{CP}} = \pi \) is just inside the \(2\sigma \) interval.

Table 14 FC-corrected confidence intervals for \(\delta _{\scriptscriptstyle \textrm{CP}}\) and \(\sin ^2\theta _{23}\) from the fit to data in the frequentist analysis, using the reactor constraint on \(\sin ^2 2\theta _{13}\). The \(3\sigma \) FC correction was not computed for \(\sin ^2\theta _{23}\)
Fig. 28
figure 28

The \(\varDelta \chi ^2\) distribution in \(\delta _{\scriptscriptstyle \textrm{CP}}\) from fitting to the data with the reactor constraint applied. The confidence intervals in the shaded regions are calculated using the FC method

Fig. 29
figure 29

The \(\varDelta \chi ^2\) distribution in \(\delta _{\scriptscriptstyle \textrm{CP}}\) for incremental modifications from the previous analysis [1] to this result, for normal and inverted mass orderings. “E” corresponds to this analysis, except that unlike the main frequentist result, the \(\mu \)-like samples do not use the scattering angle information for better compatibility with the previous T2K analysis

Fig. 30
figure 30

The \(\varDelta \chi ^2\) distribution in \(\delta _{\scriptscriptstyle \textrm{CP}}\) from fitting to the data assuming normal (left) and inverted (right) neutrino mass ordering, with the reactor constraint applied. The distribution is overlaid with the expectations from an ensemble of toy simulated experiments created with true normal ordering and \(\delta _{\scriptscriptstyle \textrm{CP}} = -\pi /2,\) showing the \(\varDelta \chi ^2\) for 68% and 95% of the toys, and their median

As was also seen in the Bayesian analysis, the constraint on \(\delta _{\scriptscriptstyle \textrm{CP}}\) is weaker compared to T2K’s previous analysis [1, 2]. Figure 29 shows the impact on the \(\varDelta \chi ^2\) of \(\delta _{\scriptscriptstyle \textrm{CP}}\) after each update introduced in this analysis, all of which weaken the \(\delta _{\scriptscriptstyle \textrm{CP}}\) constraint, with the largest contribution being the addition of the latest data in run 10 at the FD. The data is now more consistent with the expectation, shown in Fig. 30 for both normal and inverted ordering. In most of the \(\delta _{\scriptscriptstyle \textrm{CP}}\) parameter space, T2K is below the upper limit of the 68% expectation band of ensemble experiments at maximal CP violation. The inverted mass ordering is disfavoured at more than \(1\sigma \) for all \(\delta _{\scriptscriptstyle \textrm{CP}}\) values, mostly consistent with the expected sensitivity at \(\sin \delta _{\scriptscriptstyle \textrm{CP}} = -1.\) Replacing each sample’s event distribution in data by the expectation of the model shows that the stronger \(\delta _{\scriptscriptstyle \textrm{CP}}\) constraint observed in data compared to the expectation comes from the \(\nu \)-mode 1R\(e\)1d\(e\) sample.

Fig. 31
figure 31

Confidence regions in \(\sin ^2\theta _{23}-\varDelta {}m^2_{32} \) (\(|\varDelta m^2_{31}|\) in inverted mass ordering) for the data fit with the reactor constraint applied, obtained with the constant \(\varDelta \chi ^2\) method, where in each mass ordering hypothesis a fixed mass ordering is assumed

8.2.2 Atmospheric oscillation parameters

The \(\sin ^2\theta _{23}-\varDelta {}m^2_{32} \) confidence intervals are presented in Fig. 31. The contours are compatible for both mass orderings, with a slight shape change compared to the previous T2K analysis, to now marginally prefer the upper octant. Figure 32 shows the contour from a fit using only the 1R\(\mu \) samples, which shows that the constraint is dominated by the 1R\(\mu \) samples, with the 1R\(e\) samples providing the sensitivity to the octant of \(\theta _{23}.\)

The evolution of the \(\sin ^2\theta _{23}-\varDelta {}m^2_{32} \) contour from the fit to data after introducing each update in the analysis is shown in Fig. 33. The most significant impact on the \(\varDelta {}m^2_{32}\) constraint comes from changing the cross-section model and updating the ND constraint. For \(\varDelta {}m^2_{32}\), improvements in the removal energy uncertainty have significantly reduced the uncertainty before the smearing based on simulated data studies has been applied. Thanks to increased robustness of the uncertainty model, the size of the smearing has also been reduced by a factor of 2.8, discussed in detail in Sect. 9. For \(\sin ^2\theta _{23}\), there is a slight shift in shape from the latest data in the FD. The new data favour a slightly larger \(\sin ^2\theta _{23}\), and so pushes less against the boundary of maximal mixing. This results in a slightly weaker constraint in the lower octant and similar constraint in the upper octant compared to the previous analysis.

Fig. 32
figure 32

Comparison of confidence regions in \(\sin ^2\theta _{23}- \varDelta {}m^2_{32} \) for normal ordering, between a full fit and a fit using only the \(\mu \)-like samples. The intervals are calculated with the constant \(\varDelta \chi ^2\) method, and applying the reactor constraint

Fig. 33
figure 33

The \(\varDelta \chi ^2\) distribution in \(\varDelta {}m^2_{32}\) and \(\sin ^2\theta _{23}\) for normal ordering, showing incremental modifications of the previous analysis [1, 2] to this result. “E” corresponds to this analysis, except that unlike the main frequentist result, the \(\mu \)-like samples do not use the scattering angle information for better compatibility with the previous analysis. The best-fit point for “C” in orange is the same as “D” in green

The \(\varDelta \chi ^2\) distribution for \(\sin ^2\theta _{23}\) is shown in Fig. 34 with the confidence intervals summarised in Table 14. The new data at the FD has reduced compatibility with maximal mixing, which is now outside the FC-corrected \(1\sigma \) confidence interval. Whilst the upper octant is favoured at \(1\sigma \) CL, the data is still compatible with both octant hypotheses at 90% CL. These results are compatible with the sensitivity, shown in Fig. 35.

Fig. 34
figure 34

The \(\varDelta \chi ^2\) distribution in \(\sin ^2\theta _{23}\) for fitting to the data with the reactor constraint applied. The confidence intervals in the shaded regions are calculated using the FC method

Fig. 35
figure 35

The \(\varDelta \chi ^2\) distribution in \(\sin ^2\theta _{23}\) for fitting to the data with the reactor constraint applied, overlaid with the expected distributions from an ensemble of simulated experiments created with true NO and \(\sin ^2\theta _{23} = 0.56\)

Fig. 36
figure 36

Comparison of the 68% and 90% confidence intervals from fits to data from the Bayesian analysis (“Analysis A”) and the frequentist analysis (“Analysis B”), discussed in Sect. 8.3. “Analysis B, A-like” configures the frequentist analysis in the same way as the Bayesian analysis, using the same binning at the FD and the same MCMC-based ND analysis. The contours are extracted from fits that fix the neutrino mass ordering to the normal ordering and apply the reactor constraint on \(\sin ^2\theta _{13}\)

8.3 Cross-fitter comparisons

To compare the consistency of the Bayesian analysis described in Sect. 8.1 and the frequentist analysis described in Sect. 8.2, the Bayesian posterior distributions were recast into frequentist \(\varDelta \chi ^{2}\) distributions comparable to the frequentist analysis. Figure 36 shows comparisons of \(\sin ^2\theta _{23}-\varDelta {}m^2_{32} \) and \(\sin ^2\theta _{13}-\delta _{\scriptscriptstyle \textrm{CP}} \) contours from fits to data from both frameworks. The minor differences between the resulting contours can be attributed to two distinct analysis choices: the way in which the constraints from the near-detector analysis on the systematic uncertainties are applied, and the choice of kinematic variables in which the far detector samples are binned. The Bayesian analysis uses a ND analysis with irregular rectangular binning for the ND samples to better adapt to differences in \(p_{\mu }-\cos \theta _{\mu }\) phase space density, whereas the frequentist analysis’ ND constraint uses regular binning. Both analyses use the lepton scattering angle for the e-like samples, but differ in the use of reconstructed energy (Bayesian) or reconstructed lepton momentum (frequentist) for the \(\mu \)-like samples. Additionally, the frequentist analysis also uses the reconstructed lepton scattering angle for the \(\mu \)-like samples to disentangle systematic uncertainties related to energy scale and mis-reconstructed backgrounds at the FD. Other differences, like the non-Gaussian nature of parameters included in the ND constraint or event-by-event vs. binned oscillation probability calculation, had little effect. When the frequentist analysis is configured to impose the constraints from the ND MCMC analysis and bin the FD samples similarly to the Bayesian analysis, these minor differences abate, shown in Fig. 36. The uncertainty models of both analyses were validated against each other for consistency and were found to agree.

Fig. 37
figure 37

Comparison of the 90% confidence regions in \(\sin ^2\theta _{23}-\varDelta {}m^2_{32} \) for normal ordering with NOvA  [138], Super-K [139], IceCube [140], and MINOS+ [141]. The NOvA and IceCube constraints are obtained with the FC method, but with different treatment of the mass ordering: NOvA takes the minimum over both mass orderings, whereas the IceCube contours assume normal ordering. The T2K, Super-K, and MINOS+ contours are computed with the constant \(\varDelta \chi ^2\) method, assuming normal ordering

8.4 Comparisons with other experiments

In the global context of neutrino oscillation experiments, these results provide leading constraints on both the atmospheric oscillation parameters, \(\varDelta {}m^2_{32}\) and \(\sin ^2\theta _{23}\), and the CP-violating phase, \(\delta _{\scriptscriptstyle \textrm{CP}}\). Whereas the other experiments profile over parameters to calculate the \(\varDelta \chi ^2,\) T2K instead calculates the marginal likelihood for the \(\varDelta \chi ^2.\) Figure 37 shows the 90% confidence regions in \(\sin ^2\theta _{23}-\varDelta {}m^2_{32} \) for the normal ordering from the frequentist analysis, compared to NOvA, SK and IceCube. There is general agreement between the experiments, with T2K providing the strongest constraints on both parameters. Figure 38 compares the 90% confidence regions in \(\sin ^2\theta _{23}-\delta _{\scriptscriptstyle \textrm{CP}} \) for both orderings to NOvA and SK. The confidence intervals on \(\sin ^2\theta _{23}\) significantly overlap, as do the intervals for \(\delta _{\scriptscriptstyle \textrm{CP}}\). In the normal ordering, T2K excludes large regions of the NOvA constraint at 90% confidence interval, and NOvA excludes parts of T2K’s 90% confidence interval. In the inverted ordering, the experiments consistently favour the \(\pi<\delta _{\scriptscriptstyle \textrm{CP}} <2\pi \) region, with a weak preference for the upper octant. Importantly, there is no significant tension between the experiments, and more data is needed to elucidate the matter. Furthermore, the joint oscillation analyses with the NOvA and SK collaborations will help address this.

9 Simulated data studies

Simulated data studies with the frequentist analysis were used to investigate the impact of alternative model predictions and data-driven tunes, discussed in Sect. 5.3, on the oscillation parameter constraints. The oscillation analysis in Sect. 8 had these uncertainty inflation strategies applied, and this section summarises the procedure, with details provided in Appendix B.

9.1 Methodology

In the simulated data studies, the prediction from an alternative model is treated as the data at the ND, and is fit with the usual systematic uncertainty model. The parameters are fit to simulated data at the ND and are propagated to the FD, and the reconstructed energy spectrum and oscillation parameter constraints are compared to an “Asimov” data set. In an Asimov analysis, the parameters for the systematic uncertainties are set to specific values and the predicted spectra at each detector is treated as the data, giving an expectation of the sensitivity if no statistical fluctuations were present. For the oscillation parameters, two separate Asimov points were tested: one close to T2K’s best-fit point, and one with \(\delta _{\scriptscriptstyle \textrm{CP}} =0\) and non-maximal \(\sin ^2\theta _{23}\), detailed in Appendix B, Table 17. This section only presents results with the Asimov data near T2K’s best-fit point. The PDG 2019 reactor constraint on \(\sin ^2\theta _{13}\) is applied in the following studies, but had little impact on the overall conclusions. Although simulated data sets can result in both weaker and stronger constraints on the oscillation parameters than the expected sensitivity, they are only used to inflate the uncertainties in this analysis.

Fig. 38
figure 38

Comparison of 90% confidence regions in \(\sin ^2\theta _{23}- \delta _{\scriptscriptstyle \textrm{CP}} \) over both mass orderings with NOvA  [138] and Super-K [139]. The T2K and NOvA confidence regions have been computed using the FC method, whereas the Super-K results are obtained with the constant \(\varDelta \chi ^2\) method

The simulated data set procedure mainly identifies two types of effects:

  • Systematic uncertainty model shortcomings: If the systematic uncertainty model is robust, or if the effect of the alternative model is small, the oscillation parameter contours obtained with the simulated data sets will not see a bias with respect to the expected sensitivity. The bias is quantified as the percentage change of the middle of the \(1\sigma \) confidence interval of an oscillation parameter, relative to the \(1\sigma \) from systematic uncertainties in the expected sensitivity analysis. An example is discussed in Appendix B.1.

  • ND to FD extrapolation issues: Some alternate models may not produce a significant bias on the oscillation parameters, often due to the low sensitivity of the samples they affect. Issues in the extrapolation process can be exposed by comparing three distributions: (i) the predicted spectrum at the FD from fitting the Asimov data set at the ND, (ii) the predicted spectrum at the FD from fitting to the simulated data at the ND, (iii) and the predicted spectrum at the FD when applying the alternative model directly. Even though the bias on the oscillation parameters at T2K statistics may be small, simulated data studies may guide which of the systematic uncertainties are important to address in future T2K analyses and upcoming high-statistics experiments, such as Hyper-Kamiokande [142] and DUNE [143]. An example is discussed in Appendix B.2.

All the individual biases are summed in quadrature and are used to inflate the confidence interval for \(\varDelta {}m^2_{32}\), due to its simple Gaussian probability density. For \(\delta _{\scriptscriptstyle \textrm{CP}}\), the effect of systematic uncertainties is much smaller and the probability density is non-Gaussian, so a different method is applied. Each simulated data set is studied to see if it impacts any major claims in the analysis; in this analysis the 90% confidence interval of \(\delta _{\scriptscriptstyle \textrm{CP}}\). This is done by calculating the difference in the \(\varDelta \chi ^2\) distribution for \(\delta _{\scriptscriptstyle \textrm{CP}}\) for the Asimov data and the simulated data, and adding the difference to the \(\varDelta \chi ^2\) distribution for \(\delta _{\scriptscriptstyle \textrm{CP}}\) from the data, where the 90% confidence interval was calculated using the FC method mentioned in Sect. 8.2.

Simulated data sets can drastically increase or decrease the number of events at both detectors. In such cases, comparing the constraints on the oscillation parameters to the expected sensitivity conflates the effects of propagating mis-modelling from the ND analysis with the impact of increased or decreased statistics from the simulated data set. For instance, an alternative model that increases the number of at the FD near the oscillation maximum will likely lead to a stronger constraint on \(\delta _{\scriptscriptstyle \textrm{CP}}\) due to the measurement being dominated by statistical uncertainties in those samples. To gauge this effect, the three predictions from the ND to FD extrapolation studies, outlined earlier, are used. If the model from the ND simulated data analysis predicts the spectrum of the alternative model well at the ND, and correctly predicts the spectrum at the FD compared to when directly applying the alternative model, a “scaled Asimov” approach is utilised. In these cases, two changes to the procedure are made: (a) the propagated ND constraint is the expected sensitivity to the systematic parameters if the real data is as predicted by the pre-fit model, and (b) the variation to the model that was used to build the simulated data set is also applied to the simulation that is being fit at the FD. This removes most of the statistical effect and better captures the features of propagating a mis-modelling in the ND analysis. For this analysis, the scaled approach was only used for the 2p2h Martini simulated data set.

The simulated data studies all concerned the interaction model and were detailed in Sect. 5.3. Details of the simulated data study procedure and two examples are provided in Appendix B.

9.2 Results

Table 15 summarises the observed biases on the oscillation parameters, showing the simulated data set with the highest impact from each category. The full results are shown in Appendix B, Table 18. The impact of the simulated data studies on \(\sin ^2\theta _{23}\) and \(\delta _{\scriptscriptstyle \textrm{CP}}\) was found to be small compared to the impact of statistical and systematic uncertainties. The largest bias on \(\varDelta {}m^2_{32}\) relative to the systematic uncertainty was found to be 57.8% from the pion SI simulated data set, and 20.8% relative to the overall uncertainty. Selected simulated data studies were added in quadratureFootnote 4 to avoid double counting similar physics effects, leading to an overall smearing on \(\varDelta {}m^2_{32}\) of \(1.35\times 10^{-5}~\text {eV}^2.\) For comparison, the overall uncertainty on \(\varDelta {}m^2_{32}\) from the expected sensitivity study, before the simulated data procedure, was \(5.7\times 10^{-5}~\text {eV}^2\) and is dominated by the uncertainty from statistics, which was \(5.3\times 10^{-5}~\text {eV}^2.\) Generally, the simulated data studies had a smaller impact on \(\delta _{\scriptscriptstyle \textrm{CP}}\), due to its uncertainty being dominated by the statistics in the electron-like selections at the FD.

Table 15 Biases on the main oscillation parameters for each simulated data set, calculated as the shift in the middle of the \(1\sigma \) confidence interval relative to the overall uncertainty from systematic sources (“Syst.”) and the total (“Total”) to one decimal place

In the previous T2K analyses [1, 2], the simulated data study for the nucleon removal energy had a significant impact, especially on \(\varDelta {}m^2_{32}\), and an additional uncertainty was introduced. In this analysis, the updated nucleon removal energy uncertainty, described in Sect. 5, has caused it to no longer be the dominant source of systematic uncertainty.

Table 16 shows the changes to the 90% confidence interval for \(\delta _{\scriptscriptstyle \textrm{CP}}\) for each of the simulated data studies. The non-CCQE and the data-driven pion momentum modification simulated data sets had the largest impact, shifting the 90% CL by 0.09 and 0.07 respectively. The change to the 90% confidence limits does not alter the conclusions on the exclusion of CP-conserving values presented in Sect. 8.

Table 16 Shifts of the 90% confidence interval boundaries of \(\delta _{\scriptscriptstyle \textrm{CP}}\), in radians, as a result of the simulated data studies. The values in the top row correspond to the results of the data fit, assuming normal ordering. The values for each simulated data set are added to (subtracted from) the right (left) \(\delta _{\scriptscriptstyle \textrm{CP}}\) interval edge from the data fit. Only the absolute size of the shift is taken into account

10 Conclusions

The T2K collaboration has measured the three-flavour PMNS neutrino oscillation parameters \(\varDelta {}m^2_{32}\), \(\sin ^2\theta _{13}\), \(\sin ^2\theta _{23}\), \(\delta _{\scriptscriptstyle \textrm{CP}}\), the Jarlskog invariant J,  and the mass ordering, using the statistics at the FD equivalent to \(3.6\times 10^{21}\) POT. T2K continues to favour neutrino oscillations with near-maximal CP violation, in the upper octant of \(\sin ^2\theta _{23}\), in the normal mass ordering, with a \(\sin ^2\theta _{13}\) consistent with the measurements by reactor experiments.

The analysis included \(4.72\times 10^{20}\) POT more neutrino data at the FD, and \(5.73(4.48)\times 10^{20}\) POT more (anti-)neutrino data at the ND. For the first time, a neutrino flux constraint using charged pion data from a T2K replica target at NA61/SHINE was used, which approximately halves the flux uncertainty before the ND analysis. An updated neutrino interaction model with a refined initial-state and removal-energy model with associated uncertainties was also employed, amongst others. High statistics ND data was used to constrain the neutrino flux and interaction model uncertainties at the FD, which also constrains the wrong-sign background of the neutrino beam with the magnetised ND. Biases from unmodelled systematic uncertainties were studied through simulated data studies, which acted to inflate the \(\varDelta {}m^2_{32}\) and \(\delta _{\scriptscriptstyle \textrm{CP}}\) confidence intervals. These results present the strongest constraints on several neutrino oscillation parameters, and are more consistent with the expected sensitivities to the oscillation parameters compared to T2K’s previous analysis.

The results are limited by statistics, and T2K will continue to take data as J-PARC upgrades [27, 28] the neutrino beam for the Hyper-Kamiokande experiment [142]. In preparation, the T2K beamline has recently undergone a long shutdown, and will be operating the magnetic horns at 320 kA current, with beam power in excess of 700 kW in the near future [27, 28]. Upcoming analyses at the FD will expand selections to include multiple Cherenkov rings, increasing statistics by approximately 30%. The FD has also begun collecting data with gadolinium doped in the ultra-pure water [39], drastically increasing the efficiency in tagging interactions producing neutrons. At the ND, selections are being developed to improve the understanding of nuclear effects in neutrino interactions, such as 2p2h, in-medium corrections, and the initial state, thus addressing the larger systematic uncertainties in this analysis. Furthermore, the ND280 upgrade [144,145,146] will be ready to take data in 2023, providing significantly improved reconstruction capabilities for low momentum protons and pions with full angular coverage, which will allow for detailed study of nuclear effects, in addition to measurements of neutron kinematics. Moreover, the T2K collaboration is actively working with the NOvA and SK collaborations on combined neutrino oscillation analyses, taking advantage of synergies in experiment design to lift current degeneracies and increase statistics.