1 Introduction

Leptonic decays of the \({{B} ^+} \) meson are rare, as branching fractions are proportional to the squared magnitude of the small Cabibbo–Kobayashi–Maskawa (CKM) matrix element \(V_{ub}\). Among these processes, the decays \(B^{+}\rightarrow {\uptau ^+} {{\upnu } _\tau } \) and \({{{B} ^+}} \rightarrow {\mu ^+} {{\upnu } _\mu } \) have precise Standard Model (SM) predictions [1] given the absence of hadrons in the final state.Footnote 1 Due to helicity suppression, they are also highly sensitive to particles predicted in extensions of the SM such as charged scalars [2]. Measurements of the \({{{B} ^+}} \rightarrow {\uptau ^+} {{\upnu } _\tau } \) decay from the \(B \) factories [3,4,5,6] lead to an average branching fraction of \((1.4\pm 0.3)\times 10^{-4}\) [7] consistent with the SM prediction within the experimental uncertainty. An upper limit of \(1.1\times 10^{-6}\) [8] is set on the \({{{B} ^+}} \rightarrow {\mu ^+} {{\upnu } _\mu } \) branching fraction at 90% confidence level.

The radiative version of the muonic decay, \({{{B} ^+}} \rightarrow {\mu ^+} {{\upnu } _\mu } \gamma \), is important for two reasons; it is a background for the \({{{B} ^+}} \rightarrow {\mu ^+} {{\upnu } _\mu } \) decay, and its branching fraction is a direct measurement of the inverse moment of the \(B \) meson light cone distribution amplitude, which is very difficult to calculate theoretically [9]. The upper limit on the branching fraction for the \({{{B} ^+}} \rightarrow {\mu ^+} {{\upnu } _\mu } \gamma \) decay is \(3.0\times 10^{-6}\) [10] at 90% confidence level.

A B decay vertex with just a single charged particle makes a search for the \({{{B} ^+}} \!\rightarrow \mu ^{+}\nu _\mu \) and \({{{B} ^+}} \rightarrow {\mu ^+} {{\upnu } _\mu } \gamma \) decays highly challenging in the LHC environment. This problem is not present for the decay \({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } \), depicted in Fig. 1. The decay receives a contribution from the \({{{B} ^+}} \rightarrow {\mu ^+} {{\upnu } _\mu } \gamma ^{*}\) with \(\gamma ^{*}\rightarrow {\mu ^+\mu ^-} \) amplitude, where the annihilation to the \({\mu ^+} {{\upnu } _\mu } \) pair occurs through an intermediate \({B} ^*\) meson. It also receives contributions from the \({{{B} ^+}} \rightarrow {\mu ^+} {{\upnu } _\mu } V\) amplitude, where V denotes a vector meson such as the \(\omega \) or the \(\rho \), that can decay to a pair of muons. With these contributions, nearly all decays have a muon pair with a mass below \(1{\mathrm {\,GeV\!/}c^2} \). The only theoretical calculation available is based on vector-meson dominance and predicts that the corresponding branching fraction, \(\mathcal {B}({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } )\), is around \(1.3 \times 10^{-7}\) [11].

Fig. 1
figure 1

Feynman diagrams of the contributions (top left) \({{{B} ^+}} \rightarrow {\mu ^+} {{\upnu } _\mu } \gamma ^{*}\) with \(\gamma ^{*}\rightarrow {\mu ^+\mu ^-} \), (top right) \({{{B} ^+}} \rightarrow {\mu ^+} {{\upnu } _\mu } V\) and (bottom) bremsstrahlung to the \({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } \) decay

This paper describes a search for the decay \({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } \) using a partial reconstruction method that infers the momentum of the missing neutrino to obtain a mass estimate of \({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } \) decays. This search uses proton-proton (pp) collision data corresponding to an integrated luminosity of \(4.7\,\text{ fb }^{-1} \) collected during the three periods 2011 (\(7~\hbox {TeV} \) collision energy), 2012 (\(8~\hbox {TeV} \)) and 2016 (\(13~\hbox {TeV} \)) at the LHCb experiment. The detector is described in Sect. 2, followed by a description of how the signal is separated from backgrounds using two multivariate classifiers in Sect. 3. The evaluation of the background is covered in Sect. 4, the normalisation of the branching fraction of the signal to the decay \({{{B} ^+}} \!\rightarrow {{J/\psi }} {{K} ^+} \) with \({{J/\psi }} \!\rightarrow {\mu ^+} {\mu ^-} \) in Sect. 5, the limit on the branching fraction in Sect. 6 and the systematic uncertainties in Sect. 7. Finally, conclusions are presented in Sect. 8.

2 Detector and simulation

The LHCb detector [12, 13] is a single-arm forward spectrometer covering the pseudorapidity range \(2<\eta <5\), designed for the study of particles containing \(b \) or \(c \) quarks. The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region [14], a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about \(4{\mathrm {\,Tm}}\), and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. The tracking system provides a measurement of the momentum, \(p\), of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at \(200\,{\mathrm {GeV/c}} \). The minimum distance of a track to a primary vertex (PV), the impact parameter (IP), is measured with a resolution of \((15+29/p_{\mathrm { T}}){\,\upmu \mathrm {m}} \), where \(p_{\mathrm { T}}\) is the component of the momentum transverse to the beam, in \(\,{\mathrm {GeV/c}}\). The secondary vertex (SV) resolution for three-body decays is around \(20{\,\upmu \mathrm {m}} \) in the plane transverse to the beam axis and \(200{\,\upmu \mathrm {m}} \) along the beam axis.

Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors [15]. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers [16].

The online event selection is performed by a multistage trigger [17]. For the analysis described here, the events are first required to pass a hardware trigger, selecting events containing at least one muon with high \(p_{\mathrm { T}}\). In the subsequent software trigger at least one muon candidate is required to have high \(p_{\mathrm { T}}\) and a large impact parameter with respect to any PV. The dominant path through the last level of the trigger is a selection that requires a SV consisting of two muons with a high combined mass.

Simulated events are used to optimise the signal selection, estimate background contamination as well as calculate the relative efficiency between the signal and a normalisation channel. In the simulation, pp collisions are generated using Pythia [18, 19] with a specific LHCb configuration [20]. Decays of hadronic particles are described by EvtGen [21], in which final-state radiation is generated using Photos [22]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [23, 24] as described in Ref. [25].

Three different models are used in the simulation for the \({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } \) decay. The nominal model, with which efficiency for signal selection (\(\varepsilon ({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } )\)) is calculated, has a photon pole for one of the muon pairs and a uniform mass distribution for the combination of the third muon and the neutrino. For systematic checks, a flat phase space model is used. As a third model, the recently proposed vector-meson dominance model for the decay is used [11].

3 Selection

Signal \({{B} ^+} \) decay candidates are reconstructed by combining one negatively and two positively charged tracks. These tracks are required to be of good quality, be inconsistent with originating from any PV, be positively identified as muons and form a good-quality SV displaced from any PV. The PV with the smallest \({\chi }_{\text {IP}}^2 \) is the associated PV, where \({\chi }_{\text {IP}}^2\) is defined as the difference in the vertex-fit \(\chi ^2\) of a given PV reconstructed with and without the \({{B} ^+} \) trajectory included. The momentum vector of the \({{{B} ^+}} \) decay products is required to point in the same direction as the line connecting the associated PV and the SV with an allowance made for the momentum that is carried away by the neutrino in the decay.

At most one hit in the muon stations is allowed to be shared between two different muon candidates. This reduces the rate of hadrons misidentified as muons when there is already a muon of the same sign in the detector. In this analysis that has two muons of the same sign in the final state it is essential to reduce this type of misidentification. The search for the signal is performed in the region where the lower of the two \({\mu ^+\mu ^-} \) mass combinations is below 980\({\mathrm {\,MeV\!/}c^2}\) to avoid potential background from \(\phi \!\rightarrow {\mu ^+} {\mu ^-} \) decays. Moreover, above this mass the combinatorial background grows and the expected signal yield is minimal, making a search there difficult. Backgrounds originating from candidates involving \({J/\psi }\) and \(\psi {(2S)}\) decays are removed by vetoing the mass regions \(2946{\mathrm {\,MeV\!/}c^2}< M_{{\mu ^+} {\mu ^-}} < 3176{\mathrm {\,MeV\!/}c^2} \) and \(3586{\mathrm {\,MeV\!/}c^2}< M_{{\mu ^+} {\mu ^-}} < 3766{\mathrm {\,MeV\!/}c^2} \) of the higher of the two \({\mu ^+\mu ^-} \) mass combinations. Finally, a tight particle identification (PID) selection, based on a neural network, is applied to reject misidentified hadrons.

The missing neutrino in the reconstruction of the \({{B} ^+} \) candidate is accounted for with the addition of the momentum component perpendicular to \(B \) meson flight direction, \(p_\perp \). This direction is determined from the position of the PV where the \({{B} ^+} \) meson is produced and the SV where it decays. The resulting corrected mass is defined as,

$$\begin{aligned} M_{\mathrm{corr}} = \sqrt{M_{\mu \mu \mu }^{2} + |p_{\perp }|^2} + |p_{\perp }|, \end{aligned}$$
(1)

where \(M_{\mu \mu \mu }\) is the mass of the three muons. Candidates are kept if they satisfy \(4000{\mathrm {\,MeV\!/}c^2}< M_{\mathrm{corr}} < 7000{\mathrm {\,MeV\!/}c^2} \). Inside this a signal region is defined as \(4500{\mathrm {\,MeV\!/}c^2}< M_{\mathrm{corr}} < 5500{\mathrm {\,MeV\!/}c^2} \). To avoid any bias in the development of the signal selection algorithm, the data in this region was not analysed until the selection was finalised and the systematic uncertainties evaluated. The uncertainty on the corrected mass is dominated by the resolution of the SV.

To reduce combinatorial background, where random tracks are combined to emulate the signal, a boosted decision tree classifier (BDT) [26] with the AdaBoost algorithm [27] as implemented in the TMVA toolkit [28, 29] is used. The BDT classifier is trained using simulation as a signal sample and the upper sideband \(M_{\mathrm{corr}} > 5500{\mathrm {\,MeV\!/}c^2} \) of data as a proxy of the combinatorial background candidates. To best exploit the limited amount of data available for training, a ten-fold cross-validation method [30] is employed. The BDT contains information about kinematic and geometric properties of the \({{{B} ^+}} \) candidate and associated muon tracks together with the total number of reconstructed tracks in the event. The most distinguishing properties between signal and combinatorial background candidates are the isolation of the decay vertex (as described in Ref. [31]), the \(\chi ^2\) of the \({{{B} ^+}} \) vertex, and the \({\chi }_{\text {IP}}^2\) with respect to the associated PV for all three muon candidates. The requirement on the BDT response is optimised by maximising the figure of merit \(\frac{\varepsilon _{S}}{\sqrt{n_B}+3/2}\) [32] where \(\varepsilon _{S}\) is the signal efficiency of the selection and \(n_B\) refers to the estimated number of background candidates in the signal region. The optimal BDT working point is \(40\%\) efficient on simulated signal events while rejecting \(99\%\) of the combinatorial background. For the optimisation, only relative changes in signal efficiency are relevant and these are obtained from the simulation.

A second BDT is trained to reject contamination from misidentified background. This background originates mostly from cascade decays where a \(b \) hadron undergoes a semileptonic decay through the dominant \(b \) to \(c \) transition and the resulting \(c \) hadron also decays semileptonically. The second BDT shares the same architecture, features and working-point optimisation strategy as the BDT designed to reject combinatorial background. It is trained on a background sample selected in data where two tracks are positively identified as muons and the third track is required to be in the fiducial region covered by the muon chambers but with a veto on muon identification. The signal sample is using the simulated sample after it has been accepted by the first BDT. The optimisation results in that \(40\%\) of the signal sample is retained and \(94\%\) of the misidentified background is rejected.

The overall selection results in 1797 candidates. There are no events with multiple candidates. The total efficiency for selecting the signal is about 0.1%.

4 Background estimation

The main categories of background are: combinatorial; misidentified combinations, where two muons are correctly identified but the third particle is a misidentified hadron; and partially reconstructed that have an almost identical final state to the signal.

As the combinatorial background arises from random combinations of three correctly identified muons, it has no peaking features in the considered region of corrected mass. Its contribution is estimated as part of the final fit to the data.

In order to estimate the number of misidentified background candidates and their distribution in the \(M_{\mathrm{corr}}\) variable, a data sample is obtained with the same selection as for the signal, apart from a reversal of the muon identification requirements for one of the candidate tracks. This track is still required to be within the fiducial volume of the muon detector. This selects a sample of \(\mu ^+\mu ^\pm h X\) candidates in data, where h denotes any hadron of either negative or positive charge. The sample is a mixture of partially reconstructed \(b \)-hadron decays, where both the \(b \)-hadron and the subsequent charm hadron decays semileptonically, and combinatorial background. Backgrounds where two hadrons are identified as muons are only contributing to the selected events at an insignificant level.

Probabilities of misidentifying hadrons as muons are obtained from data as a function of momentum and pseudorapidity by using control samples where the hadron species are determined purely from the kinematic properties of the decay chain [15]. As the misidentification probability is different for pions, kaons and protons [33], the species of the hadron must be determined. This is done by isolating the hadrons in the \({\mu ^+\mu ^-} h X\) sample into separate hadron PID regions and then taking into account the cross-feed, calculated using an iterative approach, between these regions. The iterative approach splits the data sample into three PID regions, where the hadron candidate is consistent with the kaon, pion and proton hypotheses, respectively. Initially, the number of misidentified candidates of a given species is assumed to be zero, and the cross-feed between regions is calculated. From this first estimate of the number of misidentified particles in each of the PID regions, the cross-feed can then be recalculated. The process repeats until the number of total misidentified particles does not change significantly from one iteration to the next when compared to the statistical uncertainty from the sample size.

Once the cross-feed between the different hadron species has been taken into account, the probability for a specific hadron to pass the stringent muon PID requirements applied in the analysis is calculated. The presence of the two real muons in the \({\mu ^+\mu ^-} h X\) background increases the probability to misidentify the hadron as a muon, mainly due to hit sharing in the muon stations. To take this into account, the hadron misidentification probability is obtained using the decay \({{B} ^0} \rightarrow {{J/\psi }} {{K} ^{*0}} \), with \({{J/\psi }} \!\rightarrow {\mu ^+} {\mu ^-} \) and \({{K} ^{*0}} \!\rightarrow {{K} ^+} {{\pi } ^-} \), as a calibration sample. It has two muons present as in the signal, and the kaon and pion can be identified without PID requirements on the particle under consideration. In this way the probability of identifying the kaon or the pion from the \({K} ^{*0}\) decay as a muon can be measured. Double misidentification in the calibration sample, where the kaon and pion hypotheses are swapped, is reduced by requiring a loose hadron identification on the hadron not under consideration for misidentification and subsequently fitted for. The background coming from protons misidentified as muons is insignificant, requiring no further action.

The final distribution of the misidentified background in \(M_{\mathrm{corr}}\) is obtained by multiplying the sample with the muon identification reversed with the relevant \(h\rightarrow \mu \) misidentification probabilities.

The level of partially reconstructed backgrounds, where three muons are correctly identified but one or more particles in addition to a neutrino are not reconstructed, is determined using simulation. An example of this type of decay is \({{B} \rightarrow {{\overline{D}{}} {}^0} {\mu ^+} \nu _{\mu } X}\) where \({{\overline{D}{}} {}^0} \rightarrow {{K} ^+} {{\pi } ^-} {\mu ^+} {\mu ^-} \) and the \(K^{+}\), \(\pi ^{-}\) and X particles are not reconstructed. For this particular background, the measurements of the branching fractions of \({{\overline{D}{}} {}^0} \rightarrow K^{+} \pi ^{-} \mu ^{+} \mu ^{-}\) [34] and \({B} \rightarrow {{\overline{D}{}} {}^0} \mu ^{+} \nu _{\mu } X\) [35] are used. In total, partially reconstructed backgrounds are estimated at the level of eleven candidates in the signal region of corrected mass.

Other potential backgrounds are considered. The decay \({{{B} ^+}} \!\rightarrow {{K} ^+} {\mu ^+} {\mu ^-} \) with the kaon misidentified as a muon contributes in candidates with a corrected mass outside the signal region. This is not the case for the \({{{B} ^+}} \!\rightarrow {{\pi } ^+} {\mu ^+} {\mu ^-} \) decay, but the low branching fraction combined with the requirement for misindentification of the pion results in a negligible background level. The decay \({{{B} ^+}} \!\rightarrow \eta ^{(')}{\mu ^+} \nu _\mu \), followed by the decay \(\eta ^{(')}\!\rightarrow {\mu ^+} {\mu ^-} \gamma \), is also considered and found to be at a negligible level after the selection criteria are applied. Finally, backgrounds that involve a charmonium state decaying to a pair of muons are excluded by the previously mentioned vetos on the \({{J/\psi }} \) and \({\psi {(2S)}} \) masses.

5 Normalisation method

The branching fraction of a \({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } \) signal is obtained by normalising to the \({{{B} ^+}} \!\rightarrow {{{J/\psi }}}(\!\rightarrow {\mu ^+} {\mu ^-} ){{K} ^+} \) decay as

$$\begin{aligned}&\mathcal {B}({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } ) \nonumber \\&\quad =\mathcal {B}({{{B} ^+}} \!\rightarrow {{J/\psi }} {{K} ^+} ) \times \mathcal {B}({{J/\psi }} \!\rightarrow {\mu ^+} {\mu ^-} ) \nonumber \\&\qquad \times \frac{\varepsilon ({{{B} ^+}} \!\rightarrow {{J/\psi }} {{K} ^+} )}{\varepsilon ({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } )} \times \frac{N({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } )}{N({{{B} ^+}} \!\rightarrow {{J/\psi }} {{K} ^+} )},\nonumber \\ \end{aligned}$$
(2)

where N is the yield of the decay, \(\varepsilon \) is the overall efficiency to reconstruct and select the decay. The braching fractions are taken from Ref. [35].

The \({{{B} ^+}} \!\rightarrow {{J/\psi }} {{K} ^+} \) candidates are selected in the same way as the signal, except that the third particle must be consistent with the kaon hypothesis and the dimuon mass consistent with the \({{J/\psi }} \) mass. This reduces the impact of systematic uncertainties related to the ratio of efficiencies in Eq. (2). Most of the signal and normalisation selection efficiencies are estimated using simulation. Efficiencies of the PID are obtained using control data samples where identities of the final-state particles can be deduced from the kinematics of the decay. The total efficiency of the \({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } \) signal is around \(37\%\) relative to the normalisation channel. This lower efficiency is caused by the lower dimuon mass for the signal that affects the trigger, reconstruction and BDT efficiencies. The muon PID requirements are also less efficient due to the sharing of muon hits between the different final-state muons in the signal decay.

Fig. 2
figure 2

Fit to the mass distribution of the selected \({{{B} ^+}} \!\rightarrow {{J/\psi }} {{K} ^+} \) candidates. The combinatorial background (purple) and misidentified \({{{B} ^+}} \!\rightarrow {{J/\psi }} {{\pi } ^+} \) decays (orange) are stacked up while the \({{{B} ^+}} \!\rightarrow {{J/\psi }} {{K} ^+} \) signal is shown as a dashed line. The data points are shown as black points with the total fit overlaid as a red solid line

The \({{{B} ^+}} \!\rightarrow {{J/\psi }} {{K} ^+} \) yield is determined by performing an unbinned extended maximum-likelihood fit to the \(\mu ^{+} \mu ^{-} K^{+}\) mass distribution. The shape of the \({{{B} ^+}} \!\rightarrow {{J/\psi }} {{K} ^+} \) mass distribution is described by a Hypatia function [36] that accounts for non-Gaussian tails on both sides of the peak. In the fit, the mean and width parameters are allowed to vary and all other parameters are determined from simulation. The shape of the misidentified background contribution of \({{{B} ^+}} \!\rightarrow {{J/\psi }} {{\pi } ^+} \) decays is modelled with a Gaussian core with power law tails on each side of the peak. The mean and width are allowed to vary freely in the fit while the tail parameters are determined from simulation. Combinatorial background is parameterised with an exponential function with a decay constant that is allowed to vary in the fit. The result of the fit is shown in Fig. 2 and yields \(2.7\times 10^5 \, {{{B} ^+}} \!\rightarrow {{J/\psi }} {{K} ^+} \) decays.

6 Signal yield determination

In order to determine the \({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } \) signal yield, an extended unbinned maximum-likelihood fit is performed to the corrected mass distribution. To improve the sensitivity of the mass fit, an event-by-event uncertainty on the corrected mass is calculated by propagating the uncertainties of the PV and SV. The data is then split into two equally sized regions with high and low fractional corrected mass uncertainties. This improves the branching fraction sensitivity by approximately 11% due to the different signal distributions in the two samples, as shown in Fig. 3.

Fig. 3
figure 3

Template distributions for signal and misidentified background shapes for high and low fractional corrected mass uncertainty. A low uncertainty on the corrected mass corresponds to data with better mass resolution. The shape of the misidentification template is obtained from a control sample while the signal template is obtained from simulation. A systematic uncertainty on the signal shape due to the choice of the signal model is not shown, as it is too small to be visible

The signal shape is modelled with the sum of two Gaussian functions with power law tails, where the tails are on both sides of the peak. The parameters of the signal shape are determined using simulation and kept fixed in the subsequent fit to the data. The combinatorial background is modelled using an exponential function, whose slope is allowed to vary in the fit and whose parameterisation is verified using simulation. The yield is left free to float in the fit.

The background from misidentified muons is obtained from the \({\mu ^+\mu ^-} h X\) control sample described in Sect. 4. The distribution and yield of this sample is fitted to a Gaussian function with a power-law tail at high corrected mass. This parameterisation is cross-checked by fitting a sample with a looser muon identification requirement. The uncertainties on the associated parameters are propagated to the fit using a multivariate Gaussian constraint. The shape and the yield of the partially reconstructed background are taken from simulation. Yields that are obtained from control samples and simulation are allowed to vary in the fit within constraints from a Poisson distribution.

Fig. 4
figure 4

Corrected mass distribution of the selected \({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } \) candidates with the fit overlaid. Samples with low and high corrected mass uncertainty are fitted as individual samples but are merged in the figure. The fit has components for (green) combinatorial background, (blue) misidentified candidates and (orange) partially reconstructed candidates. The signal component is not visible as the fitted signal yield is negative. The best fit is the solid red line while the dashed line shows how the total would have looked like if the signal had the branching fraction predicted in Ref. [11]

The fit to the corrected-mass distribution, combining both corrected-mass uncertainty categories, is shown in Fig. 4. The signal yield is negative, \(-25\pm 16\), resulting in the total fit component being slightly below the sum of the background contributions. As there is no significant signal component, a limit on the branching fraction,

$$\begin{aligned} {\mathcal {B}} ({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} \nu _\mu ) < 1.6\times 10^{-8} \end{aligned}$$

at 95% confidence level is set using the \(\mathrm {CL}_{\mathrm{s}}\) method [37]. From pseudoexperiments, the expected upper limit is found to be \(2.8\times 10^{-8}\) and the present result represents a downward fluctuation of \(1.4\sigma \). Systematic uncertainties are included in this limit and are discussed in the following section.

Table 1 Summary of systematic uncertainties. Numbers are on the relative uncertainty of the normalisation for the branching fraction of the signal

7 Systematic uncertainties

A summary of the systematic uncertainties is given in Table 1, yielding a total relative uncertainty of \(16\%\) on the normalisation of the branching fraction of the signal.

The largest systematic uncertainty arises due to the choice of the shape for the combinatorial background. If the combinatorial background is allowed to have two components with different exponential slope, the upper limit on the branching fraction changes by 14.2%. While the fit does not improve from adding in an extra component, its existence cannot be excluded from the fit to the data.

In simulation, the nominal signal model, as described in Sect. 2, creates a photon pole, increasing the branching fraction in the low dimuon mass region. The associated systematic uncertainty is estimated by replacing this decay with a model assuming a uniform phase-space distribution, but still with one of the muon pairs having a mass below \(980{\mathrm {\,MeV\!/}c^2} \). This results in a \(4.6\%\) systematic uncertainty. Using the model from Ref. [11] results in a smaller variation.

Differences in simulation and data for the ratio of trigger efficiencies between the signal and normalisation channels gives rise to a systematic uncertainty as well. The effect is evaluated by comparing the difference between the trigger efficiency of \({{{B} ^+}} \!\rightarrow {{J/\psi }} {{K} ^+} \) decays in simulation and data, yielding a \(3.5\%\) systematic uncertainty. This value represents a conservative estimate since it does not take into account an expected cancellation between signal and normalisation modes. The uncertainty in the branching fraction of the normalisation mode leads to a 3% uncertainty.

Another difference between the signal and the normalisation channels is that the kaon in the decay \({{{B} ^+}} \!\rightarrow {{J/\psi }} {{K} ^+} \) can undergo nuclear interactions in the detector with a probability proportional to the amount of material traversed and thus have a lower tracking efficiency. Following the procedure outlined in Ref. [38], the uncertainty on this amount of material leads to a 2% systematic uncertainty.

Inaccuracies in the modelling of the \({{{B} ^+}} \) production kinematics lead to differences in efficiency between the signal and the normalisation channels. To account for this, correction weights to the \({{B} ^+} \) meson momentum for the simulated samples are calculated using the measured distribution from \({{{B} ^+}} \!\rightarrow {{J/\psi }} {{K} ^+} \) decays. The difference of 1.5% in the relative efficiency between the signal and the normalisation channels, compared to the case where no weights are applied, is assigned as a systematic uncertainty.

Other smaller systematic uncertainties are assigned to account for a small fit bias due to the low amount of data available and the finite size of the simulation samples.

In the fit for the signal yield, all systematic uncertainties, apart from the variation in the background shape, affect the efficiency ratio and are added as Gaussian constraints on the relevant efficiency ratios when calculating the limit. They are assumed to be fully correlated between the bins of fractional corrected mass uncertainty and uncorrelated between the different effects. For the background shape, the increased freedom in the shape leads to a larger uncertainty in the signal yield. The likelihood distribution used for determining the limit is stretched by the relative change in uncertainty around the minimum to reflect this.

8 Conclusions

A search has been performed for the rare leptonic decay \({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } \), using \(4.7\,\text{ fb }^{-1} \) of proton-proton collision data collected by the LHCb experiment. No signal is observed for the \({{{B} ^+}} \!\rightarrow {\mu ^+} {\mu ^-} {\mu ^+} {{\upnu } _\mu } \) decay and an upper limit of \(1.6\times 10^{-8}\) at 95% confidence level is set on the branching fraction, where the lowest of the two \({\mu ^+\mu ^-} \) mass combinations is below \(980{\mathrm {\,MeV\!/}c^2} \). The limit for the full kinematic region stays the same under the assumption that the decay is dominated by intermediate vector mesons.