Observation of the semileptonic decay $B^{+}\to p\overline{p}\mu^{+}\nu_{\mu}$

The Cabibbo-suppressed semileptonic decay $B^{+}\to p\overline{p}\mu^{+}\nu_{\mu}$ is observed for the first time using a sample of $pp$ collisions corresponding to an integrated luminosity of 1.0, 2.0 and 1.9fb$^{-1}$ at centre-of-mass energies of 7, 8 and 13TeV, respectively. The differential branching fraction is measured as a function of the $p\overline{p}$ invariant mass using the decay mode $B^{+}\to J/\psi K^{+}$ for normalisation. The total branching fraction is measured to be \begin{align*} \mathcal{B}(B^{+}\to p\overline{p}\mu^{+}\nu_{\mu}) = (5.27 ^{+0.23}_{-0.24} \pm 0.21 \pm 0.15)\times 10^{-6}, \end{align*} where the first uncertainty is statistical, the second systematic and the third is from the uncertainty on the branching fraction of the normalisation channel.


Introduction
Studies of semileptonic B meson decays have recently generated interest due to a number of anomalies in experimental results. Measurements of the observables R(D) and R(D * ) [1][2][3][4][5][6] have shown hints of lepton non-universality with a combined significance of over 3 σ [7]. To probe the flavour structure of possible new physics contributions to these decay modes, it is desirable to make analogous measurements for decays involving different quark-level processes, such as b → u transitions. To that end, the decay mode B + → pp + ν is promising experimentally, particularly when performing the measurement at a hadron collider. The requirement of a proton anti-proton pair in the final state should significantly reduce combinatorial background, which would otherwise be significant for final states with pions.
Semileptonic decays of B mesons to a final state containing multiple baryons are as yet unobserved. A theoretical model of B + → pp + ν decays has been constructed with perturbative QCD (pQCD) [8]. This model is based on studies of several fully hadronic B → Y Y X decays where Y and Y represent baryons and X one or more mesons. By fitting the angular distributions and decay rates of the hadronic modes the authors of Refs. [8][9][10] estimate the differential rate of B + → pp + ν decays. They also predict the total branching fraction of the B + → pp + ν decay to be (1.04 ± 0.38) × 10 −4 for l = µ, e leptons. This prediction motivated a search by the Belle collaboration for this channel that lead to evidence for the B + → ppe + ν e decay mode with 3.0 σ significance [11]. The branching fraction was measured to be (8.2 +3.7 −3.2 ± 0.6) × 10 −6 , one order of magnitude smaller than the prediction.
The measurements of the fully hadronic modes show features that merit further investigation. It is surprising that the branching fractions of decays of B mesons to final states comprising only two baryons are suppressed compared to those of two baryons and one or more extra final state particles [12]. For example, the branching fraction of B 0 → pp is two orders of magnitude smaller than that of the similar B 0 → ppπ + π − decay [12,13]. Furthermore, the invariant-mass distributions of the baryon pair in B → Y Y X decays show a characteristic accumulation at low values, called the threshold enhancement effect [14][15][16][17]. Understanding the dynamics that lead to such features is difficult in fully hadronic decays, due to the interaction of the two baryons with the extra hadrons. It is therefore desirable to study semileptonic decays, such as B + → ppµ + ν µ , where such final-state interactions are absent.
In this paper, the first observation of the decay B + → ppµ + ν µ is presented. As the dynamics of the transition are not known, the branching fraction is measured in bins of pp invariant mass. These bins are then summed to obtain a measurement of the total branching fraction. The decay B + → J/ψ K + , with J/ψ → µ + µ − , is chosen as the normalisation mode as it is fully reconstructed and can pass similar selection requirements to the signal. The branching fraction within a bin i is where N i (B + → ppµ + ν µ ) is the yield of B + → ppµ + ν µ candidates in bin i, N (B + → J/ψ K + ) is the total yield of B + → J/ψ K + candidates and represents the product of the detector acceptance and the reconstruction and selection efficiencies of the two modes. The branching fractions of B + → J/ψ K + and J/ψ → µ + µ − decays are taken from Ref. [12].
The signal yields are extracted from fits to a variable called the corrected mass, which accounts for the unreconstructed neutrino in the signal decay. It is defined as [18] where |p ⊥ | is defined as the magnitude of the reconstructed ppµ + momentum transverse to the B flight direction and m 2 ppµ is the square of the ppµ + invariant mass. This study uses the data collected with the LHCb detector in proton-proton collisions in 2011, 2012 and 2016. This corresponds to integrated luminosities of 1.0, 2.0 and 1.9 fb −1 at centre-of-mass energies of 7, 8 and 13 TeV, respectively. The 2011 and 2012 data sets are treated together and collectively referred as the Run 1 data set. Charge conjugate processes are implied throughout this paper.

Detector and simulation
The LHCb detector [19,20] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector includes a high-precision tracking system consisting of a siliconstrip vertex detector surrounding the pp interaction region [21], a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes [22,23] placed downstream of the magnet. The tracking system provides a measurement of the momentum, p, of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 GeV/c. The minimum distance of a track to a primary vertex (PV), the impact parameter (IP), is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam in GeV/c. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov (RICH) detectors [24]. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers [25].
The online event selection is performed by a trigger [26], which consists of a hardware stage that performs some basic selection, followed by a software stage, which applies a full event reconstruction. At the first level, a track consistent with being a muon with significant p T is required to be present in the event. Subsequently in the software stage, two tracks are required to form a secondary vertex with significant displacement from a pp interaction vertex. A multivariate algorithm [27] is used to identify vertices that are consistent with the decay of a b hadron.
Simulation is used to determine the efficiency of the signal mode and estimate the shapes of the signal and several backgrounds modes in the fits to the m corr distribution. In the simulation, pp collisions are generated using Pythia [28] with a specific LHCb configuration [29]. Decays of unstable particles are described by EvtGen [30], in which final-state radiation is generated using Photos [31]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [32], as described in Ref. [33]. The generated B meson p and p T spectra are corrected to match the data distributions. A boosted decision tree (BDT) weighter [34] is trained on samples of B + → J/ψ K + data and simulation, independent of those used for the normalisation of the branching fraction. This is then used to correct all the simulation samples used in the analysis.

Selection
Signal candidates are constructed from three charged tracks which are required to be of good quality and well separated from any PV. The tracks must also have particle identification consistent with their particle hypothesis. The requirement for positive proton identification enforces a minimum value of p of 18 GeV/c such that the protons are above the threshold for radiating in the RICH. Similarly, the muons must have p above 3 GeV/c to propagate through the muon stations. All the tracks must have p T larger than 1.5 GeV/c and form a good-quality vertex significantly displaced from the PV with which the candidate is associated. The signal muon must have fired the hardware trigger and the reconstructed B + candidate formed by the three tracks must be consistent with the object that fired the software trigger. Potential decays of η c , J/ψ and ψ(2S) mesons to pp are removed with vetoes in the pp invariant mass of ±50 MeV/c 2 around their respective known masses [12].
The selection of the B + → J/ψ K + normalisation mode is aligned with that of the signal to reduce systematic uncertainties. The selection criteria for the signal protons are applied to the kaon and the muon of opposite sign (K + µ − ), with the exception of the particle identification criteria. The selection of the other muon is the same as that of the muon in the signal decay.
Further selection is used to reduce several sources of backgrounds relative to the signal. In total there are five variables to which selection is applied, with the chosen criteria on each optimised together. These variables, and the backgrounds targeted by them are described in the following paragraphs.
The largest background contribution comes from a mixture of partially reconstructed decays producing two protons and a muon in the final state. It is expected that the largest among these originates in b → c quark transitions. The most pernicious is B → Λ − c pµ + ν µ X decays, where X represents any number of charged or neutral pions (including none) and the Λ − c baryon decays to a final state including one proton. The other major background arises from B → ppDX decays, where the D meson may be of any variety (D 0 , D + , D * + , etc.) and ultimately decays to a final state with a muon. The contribution of B → pΛ − c X decays with the Λ − c baryon decaying semileptonically is comparatively small, as the semileptonic branching fraction is dominated by Λ − c → Λl −ν l decays. The Λ baryon flies a sufficient distance within the detector before decaying such that the resulting proton is not associated with the B decay vertex. Another source of partially reconstructed background is formed of B → ppµ + ν µ X decays, where X denotes one or more charged or neutral pions. These decays may proceed with intermediate N * or ∆ resonances and could naively be expected to have similar branching fractions to the signal. If any of these partially reconstructed decay modes produces charged tracks in addition to the ppµ + candidate, it can be efficiently suppressed with an isolation technique. Once a signal candidate has been constructed, the other tracks in the event close to the B decay vertex are examined. A BDT is used to identify those nearby tracks that can be associated with the signal candidate decay vertex. If the candidate is truly signal, there should be few other tracks that can be associated with it and the BDT should classify them with a low score. On the other hand, the extra track(s) from a partially reconstructed decay returns a high score if such tracks are found. The isolation algorithm returns the BDT output for the four tracks most likely to have come from the B vertex. These four numbers are themselves combined into a single BDT classifier, known as the charged-isolation BDT. This BDT is trained on simulation to discriminate signal from B + → Λ − c pµ + ν µ decays, which is expected to be the largest mode with extra charged tracks. The efficacy of this BDT in reducing such background is shown in Fig. 1(a). The indicated requirement on the charged BDT score rejects 80% of the major background decay B → Λ − c pµ + ν µ X, whilst retaining 93% of the signal.
For those partially reconstructed final states with only additional neutral particles, further suppression is achieved by considering the kinematics of the decays. An additional BDT, the so called part-reco BDT, considers 11 variables: the impact parameter significance of the three final-state tracks, the pp pair and the B + candidate with respect to the PV; the impact parameters of the tracks with respect to the fitted B + decay vertex; the χ 2 of the B + vertex fit; the angle between the B + candidate momentum and displacement vectors; and the difference between the p and p momenta. The part-reco BDT is trained on simulation in order to discriminate signal from a mixture of all the considered background modes. The result of this training is shown in Fig. 1(b). The selection on the part-reco BDT output removes 18% of the decays B → ppD and keeps 98% of the signal.
An additional background arises from particles that are misidentified as protons (misID). The particle identification requirements on the proton tracks are therefore further tightened. Background due to hadrons being misidentified as muons is considered and reduced to a negligible amount with a loose particle identification requirement.
In addition to the two BDTs and proton identification criteria, one further quantity is considered. The uncertainty on the corrected mass of the candidate is powerful in improving the separation between signal and background [35]. It is calculated from the estimated uncertainties on the positions of the B + primary and secondary vertices, and the momenta of the tracks. Selecting lower values of the corrected-mass uncertainty produces a sharper peak for the signal mode in the corrected mass distribution, which will aid the discrimination of the signal from background in the fit to determine the yield. Therefore, in total the selection uses five quantities (two BDTs, the proton PID, the muon PID and the corrected-mass uncertainty). In order to ascertain the optimum selection, a five dimensional grid search is performed using pseudoexperiments. Data sets are generated from the simulation samples with the expected proportions of each background. The expected signal amount is taken from the central value of the B + → ppe + ν e branching fraction reported by the Belle collaboration [11]. For the backgrounds, the current averages for the branching fractions are used if they have been measured. For those backgrounds that have not been measured, their branching fractions are estimated relative to that expected for the signal, accounting for different CKM matrix elements and the available phase space. For each point in the grid, the selection is applied to the simulation to estimate the efficiency for each component. The efficiency of the PID requirements on the simulation is estimated with a method based on data control samples [36]. For each data set the m corr variable is simulated and the expected relative uncertainty on the signal yield is found by a fit to the simulated pseudodata. These fits are not binned in m(pp) but consider the entire sample. The selection that produces the smallest relative uncertainty on the signal yield is chosen.

Signal and normalisation yields
The yields of the signal and normalisation modes are ascertained with unbinned extended maximum-likelihood fits. In the case of the normalisation mode, the invariant mass distribution of the J/ψ K + candidates is fitted. The 2011, 2012 and 2016 data sets are fitted separately and then the yields combined. The fit to the 2016 data set is shown in Fig. 2.
For the signal mode, the corrected mass is fitted. The distribution of this variable peaks at the B + mass for candidates where one massless particle is not reconstructed. On the other hand, candidates from partially reconstructed decays that are missing one or more massive particles in addition to the neutrino have wide distributions concentrated at lower corrected mass values. The Run 1 and 2016 data are combined and fitted together to improve the fit stability.
The shapes for the signal component and contributions from partially reconstructed decays are determined using simulation. The shape of the signal probability density function (PDF) is parametrised by the sum of four bifurcated Gaussian functions with a common mean. The parameters of the Gaussian functions as well as their relative fractions are all fixed in the fit. All of the background PDFs are accounted for with kernel density estimation [37]. The shape of the proton misID background comes from a separate independent data sample in which the particle identification requirements on one of the protons have been removed. Using this data sample, the expected number of proton misID candidates can be estimated and used as a constraint in the fit. A background component due to random combinations of protons and muons, referred to as the combinatorial background, is included in the fit. A sample of data for which the B + decay vertex quality selection has been reversed is used to estimate the shape of this background.
The yields of the signal, proton misID, combinatorial and partially reconstructed decays are determined by the fit, as are the relative fractions of each partially reconstructed mode. All of the fit parameters are free to vary with the exception of the misID yield which is constrained.
The fit in each m(pp) bin is performed independently. The m corr distributions in each bin, and the resulting fits are shown in Fig. 3. In each bin the fits are validated using pseudoexperiments. An ensemble of 10 5 data sets is generated and fitted with the component yields taken from the fits to data. Some small biases on the signal yield are found and these are considered as a source of systematic uncertainty.

Efficiency
The efficiencies for the signal and normalisation modes to be reconstructed and selected are both assessed with simulation. Corrections are applied to account for known differences between data and simulation in the track-reconstruction efficiency [38] and the efficiency of the hardware trigger [39]. The efficiency of the particle identification requirements on each track is evaluated with data [36] and applied to the simulation.
The binning in m(pp) reduces the dependence on the model of the B + decay when calculating the efficiency of the signal mode. However, as there are selection requirements on kinematic quantities of the candidates, there is still some residual dependence on the dynamics of the decay. The simulation is therefore weighted to represent the pQCD model of Ref. [8] as the current best estimate of how the decay proceeds. This weighting corrects the distribution of the invariant mass of the µ + ν µ system. The variation of the parameters of this model is considered as a source of systematic uncertainty.
The ratio of selection efficiencies between the signal and normalisation modes in each bin of m(pp) is shown in Table 1. These efficiencies are presented separately for the Run 1 and 2016 samples. They are combined to form an overall efficiency ratio, accounting for the difference in sample sizes between Run 1 and 2016. This combination is calculated   using the measured B + production cross-sections [40] and integrated luminosities of each data set.

Systematic uncertainties
The systematic uncertainties can be split into two categories: those that affect the calculation of the ratio of efficiencies of the signal and normalisation modes and those that may change the determination of the signal yield in the fit. For the former, each of the corrections to the simulation contributes a source of uncertainty both from the limited sizes of the samples used to derive the corrections and from the method of deriving them. The method of correcting the p and p T distributions of the B + mesons in the simulation may give rise to a systematic uncertainty. The parameters of the BDT weighter used to correct these distributions are altered and the relative efficiencies recalculated, with the difference to the nominal relative efficiency being the assigned uncertainty. An additional uncertainty due to any residual differences between data and simulation is determined using the B + → J/ψ K + decay mode. The difference in efficiency due to the selection on the two BDTs and corrected-mass uncertainty is compared between data and simulation.
To account for the uncertainty in the correction of the simulation for the reconstruction efficiency of each track, the applied weights are varied within their uncertainties and the relative efficiencies recalculated. Similarly, an uncertainty is assessed for the particleidentification weights applied to each track. The uncertainty due to the limited simulation sample sizes used to calculate the efficiencies is also included.
A further uncertainty is due to the physics model that the simulation is weighted to represent. The model affects the kinematic distributions of the final state tracks which feeds into the efficiency calculation as these distributions are biased by the selection requirements. Since the model is unproven a conservative uncertainty is taken. New sets of weights for the simulation are created that sample extreme variations of the model parameters (±5 σ), and for each variation the efficiency is recalculated. Despite this extreme test, the systematic uncertainty due to the physics model is not dominant, which reflects the flat selection efficiency over the kinematic ranges in which the final-state particles lie within each bin of m(pp). Finally, the uncertainties on the B + production cross-section [40] and integrated luminosities of the data samples are combined to give the systematic uncertainty on the averaging of the efficiencies when combining Run 1 and 2016.
In the corrected-mass fit, uncertainties arise from potential variations in the shapes Table 2: Summary of the systematic uncertainties on the differential branching fractions. The contributions pertaining to the efficiency estimate are first, those for the yield extraction are below. The particle identification and tracking efficiency uncertainties are assumed to be 100% correlated between Run 1 and 2016. The total correlations of the uncertainties between the bins are shown in Table 4. of the components. This variation is assessed with pseudoexperiments. Data sets are generated with the nominal fit model and then fitted with both the nominal model and an alternative. The width of the distribution of differences between the nominal and alternative fits is taken as the uncertainty. For those components that rely on kernel density estimators, a systematic uncertainty is assessed for the choice of smoothing parameter by varying it. The uncertainty due to the choice of model for the signal shape is found by replacing the nominal PDF with one constructed with kernel density estimators. The uncertainty due to the limited sizes of the simulation samples is determined by generating new simulation from the nominal fit PDFs with the same sample sizes and making alternative PDFs with those samples. Similarly, an estimate of the uncertainty on the shape of the proton misID background component is assessed. For the shape of the combinatorial background component, an alternative data sample is trialled which requires the two protons to be of the same charge. Finally, the small biases in the fit noted in Sec. 4 are included.

Source
A summary of the systematic uncertainties is presented in Table 2. They are given as relative uncertainties on the branching fraction with the combination accounting for the correlation of the uncertainties between the two data sets. The correlations of the total uncertainties between the bins are shown in Table 4 and the covariance matrix is presented in Table 5, in the appendix. Table 3: Number of observed B + → ppµ + ν µ candidates and differential branching fraction in each bin of m(pp). The uncertainties on the signal yields are statistical only. For the differential branching fractions the first uncertainties are statistical, the second systematic and the third from the uncertainties on the branching fractions of the normalisation channel.

Results
The fitted yields for the signal mode are presented in Table 3. The extracted yields of the normalisation channel are 14 930 ± 260 for 2011, 31 380 ± 190 for 2012 and 49 270 ± 250 for 2016. Combining these with the efficiency ratios from Sec. 5, the differential branching fraction in each m(pp) bin is calculated. The results are presented in Table 3. The relative differential branching fractions are summed over the bins, with the correlation of the systematic uncertainties between the bins accounted for, to give the total relative branching fraction of where the first uncertainty is statistical and the second systematic. Multiplying this by the current average of the normalisation branching fraction [12] leads to where the third uncertainty is from the normalisation branching fraction. Finally, the absolute differential branching fraction as a function of m(pp) is shown in Fig. 4, where the indicated uncertainties include statistical, systematic and normalisation uncertainty contributions. As expected from the theory model and the analogous hadronic decays, the differential distribution peaks at a very low value and falls off sharply. The measured total branching fraction agrees with the previous measurement from the Belle collaboration and represents the first observation of the B + → ppµ + ν µ decay mode.