Search for the rare decay $B^{+} \rightarrow {\mu}^{+}{\mu}^{-}{\mu}^{+}{\nu}_{{\mu}}$

A search for the rare leptonic decay $B^{+} \rightarrow {\mu}^{+}{\mu}^{-}{\mu}^{+}{\nu}_{{\mu}}$ is performed using proton-proton collision data corresponding to an integrated luminosity of $4.7$ fb$^{-1}$ collected by the LHCb experiment. The search is carried out in the region where the lowest of the two ${\mu}^{+}{\mu}^{-}$ mass combinations is below $980$MeV/c$^{2}$. The data are consistent with the background-only hypothesis and an upper limit of $1.6 \times 10^{-8}$ at 95% confidence level is set on the branching fraction in the stated kinematic region.


Introduction
Leptonic decays of the B + meson are rare, as branching fractions are proportional to the squared magnitude of the small Cabibbo-Kobayashi-Maskawa (CKM) matrix element V ub . Among these processes, the decays B + → τ + ν τ and B + → µ + ν µ have precise Standard Model (SM) predictions [1] given the absence of hadrons in the final state. 1 Due to helicity suppression, they are also highly sensitive to particles predicted in extensions of the SM such as charged scalars [2]. Measurements of the B + → τ + ν τ decay from the B factories [3][4][5][6] lead to an average branching fraction of (1.4 ± 0.3) × 10 −4 [7] consistent with the SM prediction within the experimental uncertainty. An upper limit of 1.1 × 10 −6 [8] is set on the B + → µ + ν µ branching fraction at 90% confidence level.
The radiative version of the muonic decay, B + → µ + ν µ γ, is important for two reasons; it is a background for the B + → µ + ν µ decay, and its branching fraction is a direct measurement of the inverse moment of the B meson light cone distribution amplitude, which is very difficult to calculate theoretically [9]. The upper limit on the branching fraction for the B + → µ + ν µ γ decay is 3.0 × 10 −6 [10] at 90% confidence level.
A B decay vertex with just a single charged particle makes a search for the B + → µ + ν µ and B + → µ + ν µ γ decays highly challenging in the LHC environment. This problem is not present for the decay B + → µ + µ − µ + ν µ , depicted in Fig. 1. The decay receives a contribution from the B + → µ + ν µ γ * with γ * → µ + µ − amplitude, where the annihilation to the µ + ν µ pair occurs through an intermediate B * meson. It also recives contributions from the B + → µ + ν µ V amplitude, where V denotes a vector meson such as the ω or the ρ, that can decay to a pair of muons. With these contributions, nearly all decays have a muon pair with a mass below 1 GeV/c 2 . A recent theoretical calculation based on vector-meson dominance predicts that the corresponding branching fraction, B(B + → µ + µ − µ + ν µ ), is around 1.3 × 10 −7 [11].
This paper describes a search for the decay B + → µ + µ − µ + ν µ using a partial reconstruction method that infers the momentum of the missing neutrino to obtain a mass estimate of B + → µ + µ − µ + ν µ decays. This search uses proton-proton (pp) collision data corresponding to an integrated luminosity of 4.7 fb −1 collected during the three periods 2011 (7 TeV collision energy), 2012 (8 TeV) and 2016 (13 TeV) at the LHCb experiment. The detector is described in Sec. 2, followed by a description of how the signal is separated from backgrounds using two multivariate classifiers in Sec. 3. The evaluation of the

Selection
Signal B + decay candidates are reconstructed by combining one negatively and two positively charged tracks. These tracks are required to be of good quality, be inconsistent with originating from any PV, be positively identified as muons and form a good-quality SV displaced from any PV. The PV with the smallest χ 2 IP is the associated PV, where χ 2 IP is defined as the difference in the vertex-fit χ 2 of a given PV reconstructed with and without the B + trajectory included. The momentum vector of the B + decay products is required to point in the same direction as the line connecting the associated PV and the SV with an allowance made for the momentum that is carried away by the neutrino in the decay.
At most one hit in the muon stations is allowed to be shared between two different muon candidates. This reduces the rate of hadrons misidentified as muons when there is already a muon of the same sign in the detector. In this analysis that has two muons of the same sign in the final state it is essential to reduce this type of misidentification. The search for the signal is performed in the region where the lower of the two µ + µ − mass combinations is below 980 MeV/c 2 to avoid potential background from φ → µ + µ − decays. Moreover, above this mass the combinatorial background grows and the expected signal yield is minimal, making a search there difficult. Backgrounds originating from candidates involving J/ψ and ψ(2S) decays are removed by vetoing the mass regions 2946 MeV/c 2 < M µ + µ − < 3176 MeV/c 2 and 3586 MeV/c 2 < M µ + µ − < 3766 MeV/c 2 of the higher of the two µ + µ − mass combinations. Finally, a tight particle identification (PID) selection, based on a neural network, is applied to reject misidentified hadrons.
The missing neutrino in the reconstruction of the B + candidate is accounted for with the addition of the momentum component perpendicular to B meson flight direction, p ⊥ . This direction is determined from the position of the PV where the B + meson is produced and the SV where it decays. The resulting corrected mass is defined as, where M µµµ is the mass of the three muons. Candidates are kept if they satisfy 4000 MeV/c 2 < M corr < 7000 MeV/c 2 . Inside this a signal region is defined as 4500 MeV/c 2 < M corr < 5500 MeV/c 2 . To avoid any bias in the development of the signal selection algorithm, the data in this region was not analysed until the selection was finalised and the systematic uncertainties evaluated. The uncertainty on the corrected mass is dominated by the resolution of the SV.
To reduce combinatorial background, where random tracks are combined to emulate the signal, a boosted decision tree classifier (BDT) [24] with the AdaBoost algorithm [25] as implemented in the TMVA toolkit [26] is used. The BDT classifier is trained using simulation as a signal sample and the upper sideband M corr > 5500 MeV/c 2 of data as a proxy of the combinatorial background candidates. To best exploit the limited amount of data available for training, a ten-fold cross-validation method [27] is employed. The BDT contains information about kinematic and geometric properties of the B + candidate and associated muon tracks together with the total number of reconstructed tracks in the event. The most distinguishing properties between signal and combinatorial background candidates are the isolation of the decay vertex (as described in Ref. [28]), the χ 2 of the B + vertex, and the χ 2 IP with respect to the associated PV for all three muon candidates.
The requirement on the BDT response is optimised by maximising the figure of merit ε S √ n B +3/2 [29] where ε S is the signal efficiency of the selection and n B refers to the estimated number of background candidates in the signal region. The optimal BDT working point is 40% efficient on simulated signal events while rejecting 99% of the combinatorial background. For the optimisation, only relative changes in signal efficiency are relevant and these are obtained from the simulation.
A second BDT is trained to reject contamination from misidentified background. This background originates mostly from cascade decays where a b hadron undergoes a semileptonic decay through the dominant b to c transition and the resulting c hadron also decays semileptonically. The second BDT shares the same architecture, features and working-point optimisation strategy as the BDT designed to reject combinatorial background. It is trained on a background sample selected in data where two tracks are positively identified as muons and the third track is required to be in the fiducial region covered by the muon chambers but with a veto on muon identification. The signal sample is using the simulated sample after it has been accepted by the first BDT. The optimisation results in that 40% of the signal sample is retained and 94% of the misidentified background is rejected.
The overall selection results in 1797 candidates. There are no events with multiple candidates. The total efficiency for selecting the signal is about 0.1%.

Background estimation
The main categories of background are: combinatorial; misidentified combinations, where two muons are correctly identified but the third particle is a misidentified hadron; and partially reconstructed that have an almost identical final state to the signal.
As the combinatorial background arises from random combinations of three correctly identified muons, it has no peaking features in the considered region of corrected mass. Its contribution is estimated as part of the final fit to the data.
In order to estimate the number of misidentified background candidates and their distribution in the M corr variable, a data sample is obtained with the same selection as for the signal, apart from a reversal of the muon identification requirements for one of the candidate tracks. This track is still required to be within the fiducial volume of the muon detector. This selects a sample of µ + µ ± hX candidates in data, where h denotes any hadron of either negative or positive charge. The sample is a mixture of partially reconstructed b-hadron decays, where both the b-hadron and the subsequent charm hadron decays semileptonically, and combinatorial background. Backgrounds where two hadrons are identified as muons are only contributing to the selected events at an insignificant level.
Probabilities of misidentifying hadrons as muons are obtained from data as a function of momentum and pseudorapidity by using control samples where the hadron species are determined purely from the kinematic properties of the decay chain [15]. As the misidentification probability is different for pions, kaons and protons [30], the species of the hadron must be determined. This is done by isolating the hadrons in the µ + µ − hX sample into separate hadron PID regions and then taking into account the cross-feed, calculated using an iterative approach, between these regions. The iterative approach splits the data sample into three PID regions, where the hadron candidate is consistent with the kaon, pion and proton hypotheses, respectively. Initially, the number of misidentified candidates of a given species is assumed to be zero, and the cross-feed between regions is calculated. From this first estimate of the number of misidentified particles in each of the PID regions, the cross-feed can then be recalculated. The process repeats until the number of total misidentified particles does not change significantly from one iteration to the next when compared to the statistical uncertainty from the sample size.
Once the cross-feed between the different hadron species has been taken into account, the probability for a specific hadron to pass the stringent muon PID requirements applied in the analysis is calculated. The presence of the two real muons in the µ + µ − hX background increases the probability to misidentify the hadron as a muon, mainly due to hit sharing in the muon stations. To take this into account, the hadron misidentification probability is obtained using the decay B 0 → J/ψ K * 0 , with J/ψ → µ + µ − and K * 0 → K + π − , as a calibration sample where. It has two muons present as in the signal, and the kaon and pion can be identified without PID requirements on the particle under consideration. In this way the probability of identifying the kaon or the pion from the K * 0 decay as a muon can be measured. Double misidentification in the calibration sample, where the kaon and pion hypotheses are swapped, is reduced by requiring a loose hadron identification on the hadron not under consideration for misidentification and subsequently fitted for. The background coming from protons misidentified as muons is insignificant, requiring no further action.
The final distribution of the misidentified background in M corr is obtained by multiplying the sample with the muon identification reversed with the relevant h → µ misidentification probabilities.
The level of partially reconstructed backgrounds, where three muons are correctly identified but one or more particles in addition to a neutrino are not reconstructed, is determined using simulation. An example of this type of decay is B → D 0 µ + ν µ X where D 0 → K + π − µ + µ − and the K + , π − and X particles are not reconstructed. For this particular background, the measurements of the branching fractions of D 0 → K + π − µ + µ − [31] and B → D 0 µ + ν µ X [32] are used. In total, partially reconstructed backgrounds are estimated at the level of eleven candidates in the signal region of corrected mass.
Other potential backgrounds are considered. The decay B + → K + µ + µ − with the kaon misidentified as a muon contributes in candidates with a corrected mass outside the signal region. This is not the case for the B + → π + µ + µ − decay, but the low branching fraction combined with the requirement for misindentification of the pion results in a negligible background level. The decay B + → η ( ) µ + ν µ , followed by the decay η ( ) → µ + µ − γ, is also considered and found to be at a negligible level after the selection criteria are applied. Finally, backgrounds that involve a charmonium state decaying to a pair of muons are excluded by the previously mentioned vetos on the J/ψ and ψ(2S) masses.

Normalisation method
The branching fraction of a B + → µ + µ − µ + ν µ signal is obtained by normalising to the B + → J/ψ (→ µ + µ − )K + decay as  where N is the yield of the decay, ε is the overall efficiency to reconstruct and select the decay. The braching fractions are taken from Ref. [32].
The B + → J/ψ K + candidates are selected in the same way as the signal, except that the third particle must be consistent with the kaon hypothesis and the dimuon mass consistent with the J/ψ mass. This reduces the impact of systematic uncertainties related to the ratio of efficiencies in Eq. (2). Most of the signal and normalisation selection efficiencies are estimated using simulation. Efficiencies of the PID are obtained using control data samples where identities of the final-state particles can be deduced from the kinematics of the decay. The total efficiency of the B + → µ + µ − µ + ν µ signal is around 37% relative to the normalisation channel. This lower efficiency is caused by the lower dimuon mass for the signal that affects the trigger, reconstruction and BDT efficiencies. The muon PID requirements are also less efficient due to the sharing of muon hits between the different final-state muons in the signal decay.
The B + → J/ψ K + yield is determined by performing an unbinned extended maximumlikelihood fit to the µ + µ − K + mass distribution. The shape of the B + → J/ψ K + mass distribution is described by a Hypatia function [33] that accounts for non-Gaussian tails on both sides of the peak. In the fit, the mean and width parameters are allowed to vary and all other parameters are determined from simulation. The shape of the misidentified background contribution of B + → J/ψ π + decays is modelled with a Gaussian core with power law tails on each side of the peak. The mean and width are allowed to vary freely in the fit while the tail parameters are determined from simulation. Combinatorial background is parameterised with an exponential function with a decay constant that is allowed to vary in the fit. The result of the fit is shown in Fig. 2    : Template distributions for signal and misidentified background shapes for high and low fractional corrected mass uncertainty. A low uncertainty on the corrected mass corresponds to data with better mass resolution. The shape of the misidentification template is obtained from a control sample while the signal template is obtained from simulation.

Signal yield determination
In order to determine the B + → µ + µ − µ + ν µ signal yield, an extended unbinned maximumlikelihood fit is performed to the corrected mass distribution. To improve the sensitivity of the mass fit, an event-by-event uncertainty on the corrected mass is calculated by propagating the uncertainties of the PV and SV. The data is then split into two equally sized regions with high and low fractional corrected mass uncertainties. This improves the branching fraction sensitivity by approximately 11% due to the different signal distributions in the two samples, as shown in Fig. 3.
The signal shape is modelled with the sum of two Gaussian functions with power law tails, where the tails are on both sides of the peak. The parameters of the signal shape are determined using simulation and kept fixed in the subsequent fit to the data. The combinatorial background is modelled using an exponential function, whose slope is allowed to vary in the fit and whose parameterisation is verified using simulation. The yield is left free to float in the fit.
The background from misidentified muons is obtained from the µ + µ − hX control sample described in Sec. 4. The distribution and yield of this sample is fitted to a Gaussian function with a power-law tail at high corrected mass. This parameterisation is cross-checked by fitting a sample with a looser muon identification requirement. The uncertainties on the associated parameters are propagated to the fit using a multivariate Gaussian constraint. The shape and the yield of the partially reconstructed background are taken from simulation. Yields that are obtained from control samples and simulation are allowed to vary in the fit within constraints from a Poisson distribution.
The fit to the corrected-mass distribution, combining both corrected-mass uncertainty categories, is shown in Fig. 4  fit component being slightly below the sum of the background contributions. As there is no significant signal component, a limit on the branching fraction, at 95% confidence level is set using the CL s method [34]. From pseudoexperiments, the expected upper limit is found to be 2.8×10 −8 and the present result represents a downward fluctuation of 1.4σ. Systematic uncertainties are included in this limit and are discussed in the following section.

Systematic uncertainties
A summary of the systematic uncertainties is given in Table 1, yielding a total relative uncertainty of 16% on the normalisation of the branching fraction of the signal. The largest systematic uncertainty arises due to the choice of the shape for the combinatorial background. If the combinatorial background is allowed to have two components with different exponential slope, the upper limit on the branching fraction changes by 14.2%. While the fit does not improve from adding in an extra component, its existence cannot be excluded from the fit to the data.
In simulation, the nominal signal model, as described in Sec. 2, creates a photon pole, increasing the branching fraction in the low dimuon mass region. The associated systematic uncertainty is estimated by replacing this decay with a model assuming a uniform phase-space distribution, but still with one of the muon pairs having a mass below 980 MeV/c 2 . This results in a 4.6% systematic uncertainty. Using the model from Ref. [11] results in a smaller variation. Differences in simulation and data for the ratio of trigger efficiencies between the signal and normalisation channels gives rise to a systematic uncertainty as well. The effect is evaluated by comparing the difference between the trigger efficiency of B + → J/ψ K + decays in simulation and data, yielding a 3.5% systematic uncertainty. This value represents a conservative estimate since it does not take into account an expected cancellation between signal and normalisation modes. The uncertainty in the branching fraction of the normalisation mode leads to a 3% uncertainty.
Another difference between the signal and the normalisation channels is that the kaon in the decay B + → J/ψ K + can undergo nuclear interactions in the detector with a probability proportional to the amount of material traversed and thus have a lower tracking efficiency. Following the procedure outlined in Ref. [35], the uncertainty on this amount of material leads to a 2% systematic uncertainty.
Inaccuracies in the modelling of the B + production kinematics lead to differences in efficiency between the signal and the normalisation channels. To account for this, correction weights to the B + meson momentum for the simulated samples are calculated using the measured distribution from B + → J/ψ K + decays. The difference of 1.5% in the relative efficiency between the signal and the normalisation channels, compared to the case where no weights are applied, is assigned as a systematic uncertainty.
Other smaller systematic uncertainties are assigned to account for a small fit bias due to the low amount of data available and the finite size of the simulation samples.
In the fit for the signal yield, all systematic uncertainties, apart from the variation in the background shape, affect the efficiency ratio and are added as Gaussian constraints on the relevant efficiency ratios when calculating the limit. They are assumed to be fully correlated between the bins of fractional corrected mass uncertainty and uncorrelated between the different effects. For the background shape, the increased freedom in the shape leads to a larger uncertainty in the signal yield. The likelihood distribution used for determining the limit is stretched by the relative change in uncertainty around the minimum to reflect this.

Conclusions
A search has been performed for the rare leptonic decay B + → µ + µ − µ + ν µ , using 4.7 fb −1 of proton-proton collision data collected by the LHCb experiment. No signal is observed for the B + → µ + µ − µ + ν µ decay and an upper limit of 1.6 × 10 −8 at 95% confidence level is set on the branching fraction, where the lowest of the two µ + µ − mass combinations is below 980 MeV/c 2 . Under the assumption that the decay is dominated by intermediate vector mesons, the limit for the full kinematic region stays the same. The limit on the branching fraction is in tension with a recent theoretical calculation based on the vector-dominance model [11].