Searches for rare $B_s^0$ and $B^0$ decays into four muons

Searches for rare $B_s^0$ and $B^0$ decays into four muons are performed using proton-proton collision data recorded by the LHCb experiment, corresponding to an integrated luminosity of 9 $\text{fb}^{-1}$. Direct decays and decays via light scalar and $J/\psi$ resonances are considered. No evidence for the six decays searched for is found and upper limits at the 95% confidence level on their branching fractions ranging between $1.8\times10^{-10}$ and $2.6\times10^{-9}$ are set.

However, new particles beyond the Standard Model (BSM) may significantly enhance these branching fractions. For example, decays via scalar and pseudoscalar sgoldstino particles into a pair of dimuons in the Minimal Supersymmetric Standard Model (MSSM) may lead to significant enhancements of the branching fractions [2]. Furthermore, rare B 0 s and B 0 decays into a pair of dimuons mediated by BSM light narrow scalar resonances (a) of the form B 0 (s) → a (µ + µ − ) a (µ + µ − ) naturally occur in extensions of the SM involving a new strongly interacting sector. In particular, such models [3,4] can account for the long-standing tension between the SM prediction [5] and the observed [6,7] value of the anomalous magnetic dipole moment of the muon, as well as the widely discussed anomalies in b → s + − transitions [8][9][10][11][12][13][14]. This motivates a search for B decays into two light scalars with masses around 1 GeV/c 2 [15].
In addition, searches are performed for B 0 (s) → J/ψ (µ + µ − ) µ + µ − decays, which are tree-level b → c transitions that proceed via W boson exchange, as demonstrated in Fig. 2. No SM predictions for these branching fractions are currently available. However, an estimate for the B 0 s decay may be obtained using a prediction for the branching fraction of the B 0 s → J/ψe + e − decay [17], the measured J/ψ → µ + µ − branching fraction [18], and assuming that the B 0 The equivalent B 0 branching fraction may be estimated by multiplying this value by Since these branching fractions are significantly below the sensitivity of this analysis, the presence of a signal would indicate BSM physics.
The branching fractions are measured using B 0 s → J/ψ (µ + µ − ) φ (µ + µ − ) decays as a normalisation channel, which has a measured branching fraction of (1.74 ± 0.14) × 10 −8 [18,19]. Since this decay has the same final state as the signal modes, many sources of systematic uncertainty cancel. First, the yield of the control mode is estimated using an unbinned maximumlikelihood fit to its invariant mass spectrum. Then, searches for B 0 (s) → µ + µ − µ + µ − and B 0 (s) → J/ψ (µ + µ − ) µ + µ − decays are made using unbinned maximum-likelihood fits to the invariant mass spectra of each decay. These are performed simultaneously in multiple intervals of the response of a multivariate classifier constructed to separate signal and combinatorial background, where combinatorial background arises from muons that do not all originate from the same b-hadron decay. In contrast, B 0 (s) → a (µ + µ − ) a (µ + µ − ) decays are searched for by imposing a minimum requirement on the multivariate classifier response due to the very low levels of background, followed by a single fit to the mass spectrum of each signal mode. For all six signal modes, the branching fractions are normalised using the yield of the control mode, its measured branching fraction, and the ratio of efficiencies between the signal and control modes calculated using simulated  events. In the case of B 0 decays, the ratio of fragmentation fractions f s /f d is also taken into account.

Detector and simulation
The LHCb detector is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, described in detail in Refs. [20,21]. It includes a high-precision tracking system consisting of a silicon-strip vertex locator (VELO), surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. Particle identification is provided by two ring-imaging Cherenkov (RICH) detectors, an electromagnetic and a hadronic calorimeter, and a muon system composed of alternating layers of iron and multiwire proportional chambers.
In the simulation, pp collisions are generated using Pythia [22] with a specific LHCb configuration [23]. Decays of unstable particles are described by EvtGen [24], in which final-state radiation is generated using Photos [25]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [26] as described in Ref. [27]. Simulated B 0 (s) → µ + µ − µ + µ − decays are generated using a phase-space model for the kinematics of the muons, due to the lack of a SM prediction for the decay dynamics. The B 0 (s) → a (µ + µ − ) a (µ + µ − ) decays are simulated with an a state with m a = 1 GeV/c 2 , a lifetime of 1 fs and a natural width of zero. With these settings it is assumed that the a natural width is significantly below the experimental resolution of approximately 20 MeV/c 2 . Simulated B 0 (s) → J/ψ (µ + µ − ) µ + µ − decays are generated using the BTOSLLBALL model [28].

Event selection
The online selection is composed of a hardware trigger, followed by a two-stage software trigger. Candidate B 0 (s) → µ + µ − µ + µ − decays are required to pass triggers designed to select decays involving muons, which are identified from tracks that penetrate the calorimeters and the iron layers between the muon stations. In the hardware stage, candidates must pass at least one of two selections: one requiring the presence of at least one high transverse momentum (p T ) muon, the other requiring a pair of muons with a product of their respective transverse momenta above a threshold. In the first software trigger stage, candidates must include a high-p T muon that is inconsistent with having originated at a primary pp interaction vertex (PV). In the second stage, requirements are imposed on pairs of muons in order to select candidates consistent with B meson decay vertices that are significantly displaced from the PV.
Candidate B 0 (s) → µ + µ − µ + µ − decays are formed by combining four tracks identified as muons that originate from a common decay vertex. The muons forming the candidate are all required to be inconsistent with originating from a PV, have a p T > 250 MeV/c, and a good track fit quality. The maximum distance of closest approach between the four tracks is required to be below 0.3 mm. The resulting B candidates are required to have an invariant mass within 1 GeV/c 2 of the known B 0 s mass [18] in the Run 1 dataset or larger than 4 GeV/c 2 in the Run 2 dataset, a good quality vertex fit, to be consistent with originating from a PV, have a significant flight distance, and the cosine of the angle between its momentum vector and the vector pointing from the PV to its decay vertex greater than zero. Hadronic background is suppressed using particle identification (PID) information provided by the muon system, calorimeters and RICH detectors, which is used to select well identified muons [29]. The different signal modes and the control channel are separated using requirements on the four invariant masses of pairs of oppositely charged muons q ij , where i and j index the positively and negatively charged muons, respectively, such that ij can take the values {11, 12, 21, 22}. Regions around the φ, J/ψ and ψ(2S) mesons are defined as 950 MeV/c 2 < q ij < 1090 MeV/c 2 , |q ij − m(J/ψ)| < 100 MeV/c 2 and |q ij − m(ψ(2S))| < 100 MeV/c 2 , respectively, where m(J/ψ) and m(ψ(2S)) are the known J/ψ and ψ(2S) masses [18]. The B 0 (s) → µ + µ − µ + µ − signal selection requires that none of the four pairs of opposite-sign muons has an invariant mass within any of the φ, J/ψ or ψ(2S) regions. This requirement is approximately 64% efficient on simulated B 0 s → µ + µ − µ + µ − candidates passing the trigger and offline selection, while rejecting 99.94% of the The control mode selection requires that at least one of the mutually exclusive pairs of dimuons (i.e. µ + 1 µ − 1 and µ + 2 µ − 2 or µ + 1 µ − 2 and µ + 2 µ − 1 ) has one dimuon mass in the φ region while the other falls in the J/ψ region. This selection is around 95% efficient on simulated decays are selected by requiring two mutually exclusive pairs of opposite-sign muons with similar invariant masses, satisfying where σ(q 2 ij ) is the dimuon invariant mass squared resolution evaluated at q 2 ij . The resolution increases roughly linearly as a function of q 2 , from around 0.003 GeV 2 to around 0.15 GeV 2 , and is estimated as a function of q 2 separately in each data-taking year using simulated B 0 s → µ + µ − µ + µ − decays. This requirement does not completely eliminate contamination from B 0 s → J/ψ (µ + µ − ) φ (µ + µ − ) decays, which may satisfy the requirement if muons from the intermediate J/ψ and φ mesons are paired incorrectly, giving two intermediate combinations with similar masses. This remaining background is vetoed by requiring that none of the four pairs of oppositely charged muons has an invariant mass in the J/ψ region.
The B 0 (s) → J/ψ (µ + µ − ) µ + µ − candidates are selected by requiring that at least one of the pairs of opposite-sign muons has a mass in the J/ψ region. Meanwhile the corresponding opposite-sign pair of muons is required to have a mass that falls outside the φ region under both a dimuon and dikaon mass hypothesis, in order to remove background decays, the two muons not forming the J/ψ candidate are required to satisfy more stringent PID criteria.
Finally, a multivariate classifier based on a boosted decision tree (BDT) algorithm from the TMVA package [30] is used to separate signal from combinatorial background. The classifier is trained using a mixture of simulated B 0 s → µ + µ − µ + µ − and B 0 → µ + µ − µ + µ − decays as the signal proxy, while the background proxy comprises data with an invariant B mass, m(µ + µ − µ + µ − ), in the regions m(µ + µ − µ + µ − ) > 5426 MeV/c 2 and m(µ + µ − µ + µ − ) < 5020 MeV/c 2 . The simulated B 0 (s) → µ + µ − µ + µ − decays used in the training are corrected to improve agreement with data as described in Sec. 5. Two separate BDT classifiers are trained for the Run 1 and Run 2 datasets due to the different data-taking conditions. The size of the available training sample is maximised in each case using the k-folding technique [31], which ensures that the BDT used to classify a given event in the data background sample is trained independent of the event itself. The following properties of the B candidate are taken as inputs to the classifiers: the logarithm of its impact parameter χ 2 with respect to the associated PV, the logarithm of its flight distance χ 2 , its flight distance, pseudorapidity, p T , decay time, decay vertex χ 2 per degree of freedom, and the minimum impact parameter χ 2 with respect to the associated PV of the four muons. Here, the impact parameter χ 2 is defined as the difference in the vertex-fit χ 2 of a given PV reconstructed with and without the track or candidate being considered. In Run 2, two variables reflecting the isolation of the B candidate from other tracks in the event are also included in the training: the changes in the decay vertex χ 2 when either one or two of the closest additional tracks in the underlying event are added to the decay vertex fit. Finally, the classifier response is transformed to give a uniform distribution between 0 and 1 for simulated decays the data are split into four intervals in the BDT classifier response with boundaries 0.2, 0.3, 0.5, 0.6 and 1.0, and a simultaneous fit to all four BDT intervals is used to search for the decays. Due to the low background levels in the B 0 (s) → a (µ + µ − ) a (µ + µ − ) search region, the BDT response in this case is only required to be greater than 0.1. Similarly, the selection for decays requires the BDT classifier response to exceed 0.05.

Background
A range of background sources have the potential to contaminate the signal search regions. The four-muon decays  [18], respectively, are reduced to negligible levels by the φ and J/ψ vetoes. Potential background to the , however the corresponding branching fraction of 1.5 × 10 −12 is well below the sensitivity of this search.
A second category of background arises from decays of b-hadrons of the form The typical probability for a hadron to be misidentified as a muon by the LHCb detector is around 1% [29], and this rate is further reduced by the PID requirements described in Sec. 3. Studies performed using simulation demonstrate that all these decays are reduced to negligible levels by the signal and the control sample selections. As such, this analysis is essentially free of background from other b-hadron decays, which are therefore not modelled in the fits for the signal and control decays.

Fit procedure
The signal yields in the branching fraction measurements are normalised with respect to the B 0 s → J/ψ (µ + µ − ) φ (µ + µ − ) control mode such that the branching fraction of a given signal decay is calculated as where the normalisation factors for B 0 s and B 0 modes, α n s and α n d , are defined as The symbols sig and norm refer to the signal and normalisation decays, respectively, B is the branching fraction, N is the yield, is the total efficiency to reconstruct and select a given decay mode and n indexes the BDT response intervals used in the simultaneous fits for B 0 The branching fraction of the B 0 s → J/ψ (µ + µ − ) φ (µ + µ − ) normalisation mode is calculated from the product of the branching fractions of the B 0 s → J/ψφ [19], J/ψ → µ + µ − and φ → µ + µ − [18] decays, yielding B norm = (1.74 ± 0.14) × 10 −8 . The ratio of fragmentation fractions is calculated as a weighted average of values measured at centre-of-mass energies of 7, 8 and 13 TeV [19], giving f s /f d = 0.250 ± 0.008. The correlation in the measurements of B (B 0 s → J/ψφ) and f s /f d is taken into account when calculating the uncertainty in the normalisation term α i d . The efficiencies are calculated using simulated decays, to which weights are applied in order to improve concordance with data. These weights are calculated by comparing B 0 s → J/ψ (µ + µ − ) φ (K + K − ) decays in data and simulation, for which the branching fraction is three orders of magnitude larger than for B 0 s → J/ψ (µ + µ − ) φ (µ + µ − ) and therefore allows for a much more precise determination of differences between data and simulation. The trigger and offline selections for these decays are similar to those used for B 0 s → J/ψ (µ + µ − ) φ (µ + µ − ) decays, but no PID requirement is applied on the kaons. The distributions of the variables of interest for B 0 s → J/ψ (µ + µ − ) φ (K + K − ) decays are separated from background in data using the sPlot method [32], using the B candidate invariant mass as the discriminating variable.
A first set of weights, referred to as the generator weights, corrects the p T and pseudorapidity distributions of B mesons as generated by Pythia 8 [22], along with the multiplicity of the underlying events. A second set of weights, referred to as the reconstruction weights, is used to correct the vertex χ 2 per degree of freedom and the impact parameter χ 2 of the B mesons. The efficiencies for each mode in each data-taking year are then calculated as where gen , presel and BDT refer to sums over the generated simulation sample before any reconstruction or selection, the offline reconstructed simulation sample passing the full selection excluding the BDT requirements, and the offline reconstructed simulation sample passing the full selection including the BDT requirements, respectively. Meanwhile, ω gen and ω rec refer to the generator and reconstruction weights. Note that the reconstruction weights are only used in the calculation of the efficiencies of the BDT requirements, since requirements on the B candidate's vertex χ 2 per degree of freedom and the impact parameter χ 2 imposed in prior stages of the selection are highly efficient for the signal modes. The individual efficiencies for each year are then combined in a weighted average according to the integrated luminosity recorded and the bb production cross-section in that year [33,34]. The invariant mass distribution of B 0 s → J/ψ (µ + µ − ) φ (µ + µ − ) candidates in the range 5100 < m(µ + µ − µ + µ − ) < 6000 MeV/c 2 is shown in Fig. 3. The yield of the normalisation mode is determined using an extended unbinned maximum-likelihood fit to this invariant mass spectrum. The B 0 s → J/ψ (µ + µ − ) φ (µ + µ − ) signal is modelled using the sum of two Crystal Ball functions [35] with common mean, referred to as a double Crystal Ball (DCB) function. The parameters of this function are determined from a fit to simulated B 0 s → J/ψ (µ + µ − ) φ (µ + µ − ) decays, in which the values are fixed in the fit to data, with the exception of the mean and widths of the DCB function, which are allowed to vary freely in order to account for differences in resolution and mass scale between data and simulation. The resulting ratio of the mass resolution in data over simulation is found to be 1.11 ± 0.09. The combinatorial background is modelled with an exponential function. Background from the six signal decays in the control mode mass distribution is negligible due to the low branching fractions of the signal decays and the requirements on the J/ψ and φ masses imposed in the control mode selection. The yield of B 0 s → J/ψ (µ + µ − ) φ (µ + µ − ) decays is found to be 218 ± 16, where the uncertainty is statistical only. The invariant mass spectrum and the fit projection are shown in Fig. 3.
As described in Sec. 3, the data used to search for B 0 (s) → µ + µ − µ + µ − and B 0 (s) → J/ψ (µ + µ − ) µ + µ − decays are split into four intervals in the BDT classifier re-5200 5400 5600 5800 6000 ]  sponse, while for the B 0 (s) → a (µ + µ − ) a (µ + µ − ) data sample a single BDT interval is defined. The normalisation factors α i s and α i d are calculated using Eq. 2 and the inputs described above, with their values Gaussian constrained to their central values according to their statistical uncertainties. Additional Gaussian constraints are used to include the effects of the systematic uncertainties in the inputs described in Sec. 6.
For each of the B 0 (s) → µ + µ − µ + µ − and B 0 (s) → J/ψ (µ + µ − ) µ + µ − signal modes, an unbinned extended maximum-likelihood fit is performed to the B candidate invariant mass distribution in the range 4900 < m(µ + µ − µ + µ − ) < 6000 MeV/c 2 , simultaneously in all four BDT intervals, in order to search for the decays and measure their branching fractions. In the case of the B 0 (s) → a (µ + µ − ) a (µ + µ − ) modes, fits are performed to the invariant mass distribution in a single region of the BDT response. All signal mass shapes are modelled using DCB functions, in which the parameters are fixed to values obtained from simulation, with the exception of the mean and widths, which are offset and scaled respectively according to the results of the fit for the control mode. Due to their overlap, the fits to the B 0 s and B 0 decays are performed separately in each case, i.e. the B 0 s component is set to zero when performing the fit to the B 0 component and vice versa. The yields of each signal mode are not fitted directly, rather they are expressed in terms of the normalisation factors and the decay branching fractions (see Eq. 2), which are allowed to vary freely. The branching fraction is constrained to be greater than or equal to zero in each fit. The combinatorial background is modelled using exponential functions, in which the slope parameters and yields are allowed to vary freely independently in each BDT interval.

Systematic uncertainties
Due to the similarity between the signal and control decay modes, many sources of systematic uncertainty cancel in the ratio of efficiencies. Nonetheless, a number of sources of systematic uncertainty have the potential to affect the measured branching fractions. Two sources of uncertainty arise from the models used to generate the simulated signal decays. In particular, since no theoretical prediction for the decay dynamics of B 0 (s) → µ + µ − µ + µ − decays is currently available, decays are generated using a phasespace model. This may lead to a bias in the efficiency calculation if the true kinematic distributions of these decays differs from the one used in the simulation. To estimate the representative scale of the bias arising from such an effect, the dimuon invariant mass squared (q 2 ij ) distributions for B 0 (s) → µ + µ − µ + µ − decays are weighted to be uniform in the (q 2 11 , q 2 22 ) plane, and the efficiencies are evaluated. A similar effect is evaluated for where the invariant mass squared of the two muons not originating from the J/ψ meson is weighted to give a uniform distribution. This leads to shifts relative to the nominal efficiencies of up to about 20% depending on the decay mode and BDT interval, which are taken as systematic uncertainties. This represents the largest source of systematic uncertainty. Similarly, the effective lifetimes of the B 0 s meson decays, B 0 depend on the CP-even and CP-odd contributions in the decay amplitudes, which are a priori unknown. These decays are generated assuming the mean B 0 s lifetime. If their true effective lifetimes differ significantly from this value, then the efficiency estimates will be biased. The maximum size of such an effect is estimated by weighting simulated decay modes to have their effective lifetimes equal to the lifetimes of the heavy and light B 0 s mass eigenstates. The largest shift with respect to the nominal efficiency is again taken as a source of systematic uncertainty, with relative shifts in the efficiencies at around the 5% level.
Further sources of uncertainty arise due to data-simulation differences. The effect of mismodelling of the PID response in simulation is evaluated using calibration samples of muons from data to compare data-driven efficiencies with those obtained directly from simulation, with the differences in the ratios of efficiencies of 1-2% taken as systematic uncertainties. The effect of the difference in mass resolution between simulation and data on the selection efficiency is estimated to be below 1% using the results of the fits to the control mode mass distribution. Finally, the efficiencies are evaluated without the application of the generator and reconstruction weights described in Sec. 5 in order to get an upper bound on the likely effect of mismodelling in simulation. Since the effect on the efficiency ratio is found to be small, up to around 10% depending on the decay and BDT interval, this shift is conservatively taken as a source of systematic uncertainty in the efficiency. All these systematic uncertainties are included in the branching fraction fit by applying Gaussian constraints to the efficiencies or efficiency ratios.

Results
The invariant mass distributions of B 0 s and B 0 candidates passing the B 0 No evidence for any of the six signal decay modes is found, with the most significant excesses found in the B 0 s → J/ψ (µ + µ − ) µ + µ − and B 0 → J/ψ (µ + µ − ) µ + µ − searches, amounting to two standard deviations in both cases, calculated using Wilks' theorem [36]. Limits are set on the branching fractions using the CL s method [37] as implemented in the GammaCombo package [38,39] using a one-sided test statistic, with 80 scan points and 2000 pseudoexperiments per scan point. The limits at 95% confidence level are The corresponding CL s scans for all six decays are shown in Fig. 7. Note that in the case of B 0 (s) → a (µ + µ − ) a (µ + µ − ) decays, these limits are evaluated assuming a promptly decaying intermediate scalar with a mass of 1 GeV/c 2 . The limits quoted for B 0 (s) → J/ψ (µ + µ − ) µ + µ − decays include the J/ψ → µ + µ − branching fraction. These results constitute the most stringent limits on each of the six B 0 (s) → µ + µ − µ + µ − decays to date and supersede previous results by the LHCb experiment [16].  Figure 6: Distribution of the µ + µ − µ + µ − invariant mass of candidates passing the B 0 (s) → J/ψ (µ + µ − ) µ + µ − selection in the most sensitive BDT interval, with the fit models used to determine the branching fractions of (left) B 0 s → J/ψ (µ + µ − ) µ + µ − and (right) B 0 → J/ψ (µ + µ − ) µ + µ − decays overlaid.