Study of the rare decays of B 0 s and B 0 mesons into muon pairs using data collected during 2015 and 2016 with the ATLAS detector

: A study of the decays B 0 s ! (cid:22) + (cid:22) (cid:0) and B 0 ! (cid:22) + (cid:22) (cid:0) has been performed using 26 : 3 fb (cid:0) 1 of 13 TeV LHC proton-proton collision data collected with the ATLAS detector in 2015 and 2016. Since the detector resolution in (cid:22) + (cid:22) (cid:0) invariant mass is comparable to the B 0 s - B 0 mass di(cid:11)erence, a single (cid:12)t determines the signal yields for both decay modes. This results in a measurement of the branching fraction B ( B 0 s ! (cid:22) + (cid:22) (cid:0) ) = (cid:0) 3 : 2 +1 : 1 (cid:0) 1 : 0 (cid:1) (cid:2) 10 (cid:0) 9 and an upper limit B ( B 0 ! (cid:22) + (cid:22) (cid:0) ) < 4 : 3 (cid:2) 10 (cid:0) 10 at 95% con(cid:12)dence level. The result is combined with the Run 1 ATLAS result, yielding B ( B 0 s ! (cid:22) + (cid:22) (cid:0) ) = (cid:0) 2 : 8 +0 : 8 (cid:0) 0 : 7 (cid:1) (cid:2) 10 (cid:0) 9 and B ( B 0 ! (cid:22) + (cid:22) (cid:0) ) < 2 : 1 (cid:2) 10 (cid:0) 10 at 95% con(cid:12)dence level. The combined result is consistent with the Standard Model prediction within 2.4 standard deviations in the B ( B 0 ! (cid:22) + (cid:22) (cid:0) )-B ( B 0 s ! (cid:22) + (cid:22) (cid:0) ) plane. systematic uncertainties. contours use of both the dimuon (26.3 fb (cid:0) 1 and the reference (15.1 fb (cid:0) 1 datasets.

The notation used throughout the paper refers to the combination of processes and their charge-conjugates, unless otherwise specified.The B 0 s → µ + µ − and B 0 → µ + µ − branching fractions are measured relative to the reference decay mode B + → J/ψ(→ µ + µ − )K + which is abundant and has a well-measured branching fraction B(B + → J/ψ K + ) × B(J/ψ → µ + µ − ).The B 0 → µ + µ − (B 0 s → µ + µ − ) branching fraction can be extracted as: where N d (N s ) is the B 0 → µ + µ − (B 0 s → µ + µ − ) signal yield, N J/ψK + is the B + → J/ψ K + reference channel yield, ε µ + µ − and ε J/ψK + are the corresponding values of acceptance times efficiency (measured in fiducial regions defined in Section 9), and f u / f d ( f u / f s ) is the ratio of the hadronisation probabilities of a b-quark into B + and B 0 (B 0 s ).In the quantity D ref = N J/ψK + × (ε µ + µ − /ε J/ψK + ), the ε ratio takes into account relative differences in efficiencies, integrated luminosities and the trigger selections used for the signal and the reference modes.Signal and reference channel events are selected with similar dimuon triggers.One half of the reference channel sample is used to determine the normalisation and the other half is used to tune the kinematic distributions of simulated events.
The event selection uses variables related to the B candidate decay time, thus introducing a dependence of the efficiency on the signal lifetime.The relation between the measured branching fraction and the corresponding value at production is established assuming the decay time distribution predicted in the SM, where the decay occurs mainly through the heavy eigenstate B 0 (s),H of the B 0 (s) -B 0 (s) system.Some models of new physics [16,17] predict modifications to the decay time distribution of B 0 s → µ + µ − and a comparison with the experimental result requires a correction to the ratio of the time-integrated efficiencies entering D ref .
The ATLAS inner tracking system, muon spectrometer and, for efficient identification of muons, also the calorimeters, are used to reconstruct and select the event candidates.Details of the detector, trigger, data sets, and preliminary selection criteria are discussed in Sections 2 and 3. A blind analysis was performed in which data in the dimuon invariant mass range from 5166 to 5526 MeV were removed until the procedures for event selection and the details of signal yield extraction were completely defined.Section 4 introduces the three main categories of background.Section 5 describes the strategy used to reduce the probability of hadron misidentification.The final sample of candidates is selected using a multivariate classifier, designed to enhance the signal relative to the dominant dimuon background component, as discussed in Section 6. Checks on the distributions of the variables used in the multivariate classifier are summarised in Section 7.They are based on the comparison of data and simulation for dimuon events, for B + → J/ψ K + candidates and for events selected as B 0 s → J/ψ φ → µ + µ − K + K − , which provide an additional validation of the procedures used in the analysis.Section 8 details the fit procedure used to extract the yield of B + → J/ψ K + events.The determination of the ratio of efficiencies in the signal and the reference channels is presented in Section 9. Section 10 describes the extraction of the signal yield, obtained with an unbinned maximum-likelihood fit performed on the dimuon invariant mass distribution.In this fit, events are separated into classifier intervals to maximise the fit sensitivity.The results for the branching fractions B(B 0 s → µ + µ − ) and B(B 0 → µ + µ − ) are reported in Section 11 and combined with the full Run 1 results in Section 12.

ATLAS detector, data and simulation samples
The ATLAS detector1 consists of three main components: an inner detector (ID) tracking system immersed in a 2 T axial magnetic field, surrounded by electromagnetic and hadronic calorimeters and by the muon spectrometer (MS).A full description can be found in Ref. [18], complemented by Ref. [19] for details about the new innermost silicon pixel layer that was installed for Run 2. This analysis is based on the Run 2 data recorded in 2015 and 2016 from pp collisions at the LHC at √ s = 13 TeV.Data used in the analysis were recorded during stable LHC beam periods.Data quality requirements were imposed, notably on the performance of the MS and ID systems.The total integrated luminosity collected by ATLAS in this period is 36.2fb −1 with an uncertainty of 2.1%.These values are determined using a methodology similar to that detailed in Ref. [20], based on calibration of the luminosity scale using x-y beam-separation scans, and use the LUCID-2 detector [21] for the baseline luminosity measurement.The total effective integrated luminosity used in this analysis -accounting for trigger prescales -amounts to 26.3 fb −1 for the signal and 15.1 fb −1 for the reference channel.
Samples of simulated Monte Carlo (MC) events are used for training and validation of the multivariate analyses, for the determination of the efficiency ratios, and for developing the procedure used to determine the signal.Exclusive MC samples were produced for the signal channels B 0 s → µ + µ − and B 0 → µ + µ − , the reference channel B + → J/ψ K + (J/ψ → µ + µ − ), and the control channel s) → hh decays with h ( ) being a charged pion or kaon, and inclusive decays B → J/ψ X as well as the exclusive B + → J/ψ π + decay.
Most of the dimuon candidates in the data sample originate from the decays of hadrons produced in the hadronisation of b b pairs.The inclusive b b → µ + µ − X MC sample used to describe this background requires the presence of two muons in the final state, with both muons originating from the b b decay chain.The size of this sample is equivalent to roughly three times the integrated luminosity of the data.
The MC samples were generated with P 8 [22].The ATLAS detector and its response were simulated using G 4 [23,24].Additional pp interactions in the same and nearby bunch crossings (pile-up) are included in the simulation.Muon reconstruction and triggering efficiencies are corrected in the simulated samples using data-driven scale factors.The scale factors for the trigger efficiencies are obtained by comparing data and simulation efficiencies determined with a J/ψ tag-and-probe method.This procedure yields scale factors as a function of the muon transverse momentum and pseudorapidity, which are applied throughout the analysis [25].Reconstruction and selection efficiencies are obtained from simulation and similarly corrected according to data-driven comparisons.In addition to these efficiency corrections, simulated events are reweighted to reproduce the pile-up multiplicity observed in data, and according to the equivalent integrated luminosity associated with each trigger selection.
Using the iterative reweighting method described in Ref. [26], the simulated samples of the exclusive decays considered are adjusted with two-dimensional data-driven weights (DDW) to correct for the differences between simulation and data observed in the B meson transverse momentum and pseudorapidity distributions.DDW obtained from B + → J/ψ K + decays are used to correct the simulation samples in the signal and reference channels.DDW obtained from the B 0 s → J/ψ φ control channel are found to agree with those from B + → J/ψ K + , showing that the same corrections are applicable to B 0 s and B 0 decays.
Residual differences between data and simulation studied in the B + → J/ψ K + and B 0 s → J/ψ φ signals are treated as sources of systematic uncertainty in the evaluation of the signal efficiency, as discussed in Section 9.The only exception to this treatment is the B meson isolation (I 0.7 in Section 6 and Table 1), where residual differences are used to reweight the signal MC events and the corresponding uncertainties are propagated to account for residual systematic uncertainty effects.
Similarly to the exclusive decays, the kinematic distributions of the inclusive b b → µ + µ − X MC sample are reweighted with corrections obtained from the dimuon invariant mass sidebands in data.

Data selection
For data collected during the LHC Run 2, the ATLAS detector uses a two-level trigger system, consisting of a hardware-based first-level trigger and a software-based high-level trigger.A first-level dimuon trigger [27] selects events requiring that one muon has p T > 4 GeV and the other has p T > 6 GeV.A full track reconstruction of the muon candidates is performed by the high-level trigger, where an additional loose selection is imposed on the dimuon invariant mass m µµ , accepting candidates in the range 4 GeV to 8.5 GeV. Due to the increased pile-up in 2016 data, an additional selection was added at this trigger stage, requiring the vector from the primary vertex to the dimuon vertex to have a positive component (L xy ) along the dimuon's transverse momentum direction.The effect of this selection is accounted for in the analysis but has no consequence since stricter requirements are applied in the full event selection (see Section 6).
The signal channel, the reference channel B + → J/ψ K + and the control channel B 0 s → J/ψφ were selected with trigger prescale factors that vary during the data-taking period.In the 36.2fb −1 of data analysed, the prescaling of the trigger approximately averages to a reduction by a factor 1.4, giving an effective integrated luminosity for the signal sample of 26.3 fb −1 , while for the reference and control channels 15.1 fb −1 were collected due to an effective prescale of 2.4.These effects are taken into account in the extraction of the signal branching fraction, through the ε factors in Eq. ( 1).
Using information from the full offline reconstruction, a preliminary selection is performed on candidates for B 0 In the ID system, muon candidates are required to have at least one measured hit in the pixel detector and two measured hits in the semiconductor tracker.They are also required to be reconstructed in the MS, and to have |η| < 2.5.The offline muon pair must pass the p T > 4 GeV and p T > 6 GeV requirements imposed by the trigger.Furthermore, the muon candidates are required to fulfil tight muon quality criteria [28]; this requirement is relaxed to loose for the hadron misidentification studies in Section 5. Kaon candidates must satisfy similar requirements in the ID, except for a looser requirement of p T > 1 GeV.
The computed B meson properties are based on a decay vertex fitted to two, three or four tracks, depending on the decay process to be reconstructed.The B candidates are required to have a χ 2 per degree of freedom below 6 for the fit to the B vertex, and below 10 for the fit to the J/ψ → µ + µ − vertex.The selections 2915 < m(µ + µ − ) < 3275 MeV and 1005 < m(K + K − ) < 1035 MeV are applied to the J/ψ → µ + µ − and the φ → K + K − vertices, respectively.In the fits to the B + → J/ψ K + and B 0 s → J/ψ φ channels, the reconstructed dimuon mass is constrained to the world average J/ψ mass [29].
Reconstructed B candidates are retained if they satisfy p B T > 8.0 GeV and |η B | < 2.5.The invariant mass of each B candidate is calculated using muon trajectories measured by combining the information from the ID and MS to improve upon the mass resolution obtained from ID information only [30].
The invariant mass range considered for the B 0 (s) → µ + µ − decay starts at 4766 MeV and is 1200 MeV wide.Within this range a 360-MeV-wide signal region is defined, starting at 5166 MeV.The remainder of the range defines the upper and lower mass sidebands of the analysis.
For the reference and control channels, the mass range considered is 4930-5630 (5050-5650) MeV for B + → J/ψ K + (B 0 s → J/ψ φ), where 5180-5380 (5297-5437) MeV is the peak region and higher and lower mass ranges comprise the mass sidebands used for background subtraction.
The coordinates of primary vertices (PV) are obtained from charged-particle tracks not used in the decay vertices, and that are constrained to the luminous region of the colliding beams in the transverse plane.The matching of a B candidate to a PV is made by extrapolating the candidate trajectory to the point of closest approach to the beam axis, and choosing the PV with the smallest distance along z.Simulation shows that this method matches the correct vertex with a probability above 99% for all relevant pile-up conditions.
To reduce the large background in the B 0 (s) → µ + µ − channel before applying the final selection based on multivariate classifiers, a loose collinearity requirement is applied between the momentum of the B candidate ( − → p B ) and the vector from the PV to the decay vertex ( − → ∆x).The absolute value of the azimuthal angle α 2D between these two vectors is required to be smaller than 1.0 radians.The combination ∆R flight = α 2D 2 + (∆η) 2 , where ∆η is the difference in pseudorapidity, is required to satisfy ∆R flight < 1.5.

Background composition
The background to the B 0 (s) → µ + µ − signal originates from three main sources: Continuum background, the dominant combinatorial component, which consists of muons originating from uncorrelated hadron decays and is characterised by a weak dependence on the dimuon invariant mass; Partially reconstructed decays, where one or more of the final-state particles (X) in a b-hadron decay is not reconstructed, causing these candidates to accumulate in the low dimuon invariant mass sideband (this background includes a significant contribution from semileptonic decays where one of the muons is a misidentified hadron, discussed below); Peaking background, due to B 0 (s) → hh decays, with both hadrons misidentified as muons.
The continuum background consists mainly of muons produced independently in the fragmentation and decay chains of a b-quark and a b-quark.It is studied in the signal mass sidebands, and it is found to be well described by the inclusive b b → µ + µ − X MC sample.
The partially reconstructed decays consist of several topologies: The b b → µ + µ − X MC sample is used to investigate the background composition after the analysis selection.All backgrounds in this sample have a dimuon invariant mass distribution mainly below the mass range considered in this analysis, with a high-mass tail extending through the signal region.The simulation does not contemplate sources other than muons from b b decays: c c and prompt contributions are not included.All possible origins of two muons in the b b decay tree are, however, analysed, after classification into the mutually exclusive continuum and partially reconstructed categories described above.This sample is used only to identify suitable functional models for the corresponding background components, and as a benchmark for these models.No shape or normalisation constraints are derived from this simulation.This makes the analysis largely insensitive to mismatches between background simulation and data.
The semileptonic decays with final-state hadrons misidentified as muons consist mainly of three-body charmless decays B 0 → π − µ + ν, B 0 s → K − µ + ν and Λ b → pµ − ν in which the tail of the invariant mass distribution extends into the signal region.Due to branching fractions of the order of 10 −6 , this background is not large, and is further reduced by the muon identification requirements, discussed in Section 5.The MC invariant mass distributions of these partially reconstructed decay topologies are shown together with the SM signal predictions in Figure 1(a) after applying the preliminary selection criteria described in Section 3. Finally, the peaking background is due to B 0 (s) → hh decays containing two hadrons misidentified as muons.The distributions in Figure 1(b), obtained from simulation, show that these decays populate the signal region.This component is further discussed in Section 5.

Hadron misidentification
In the preliminary selection, muon candidates are formed from the combination of tracks reconstructed independently in the ID and MS.The performance of the muon reconstruction in ATLAS is presented in Ref. [28].Additional studies were performed to evaluate the amount of background related to hadrons erroneously identified as muons.
Detailed simulation studies were performed for the B 0 (s) → hh channel with a full G 4-based simulation [23] of all systems of the ATLAS detector.The vast majority of background events from particle misidentification are due to decays in flight of kaons and pions, in which the muon receives most of the energy of the parent meson.Hence this background is generally related to true muons measured in the MS, but not produced promptly in the decay of a B meson.
The muon candidate is required to pass tight muon requirements in the preliminary selection, which are based on the profile of energy deposits in the calorimeters as well as on tighter ID-MS matching criteria than those used for the loose requirements.Two-body B decays in control regions show that tight selections have, relative to the loose counterpart, an average hadron misidentification probability reduced by a factor 0.39 with a muon reconstruction efficiency of 90%.The resulting final value of the misidentification probability is 0.08% for kaons and 0.1% for pions.Efficiencies and fake rates are relative to the analysis preselections, including tracking but excluding any muon requirement.
The background due to B 0 (s) → hh , with double misidentification of hh as µ + µ − , has a reconstructed invariant mass distribution that peaks at 5240 MeV, close to the B 0 mass, and is effectively indistinguishable from the B 0 signal (see Figure 1(b)).The expected number of peaking-background events can be estimated in a way analogous to that for the signal, from the number of observed B + → J/ψ K + events using Eq. ( 1), after taking into account the expected differences from muon identification variables and trigger selections.World average [29] values for the branching fractions of B 0 and B 0 s into Kπ, KK and ππ are used, together with the hadron misidentification probabilities obtained from simulation.This results in 2.7 ± 1.3 total expected peaking-background events, after the reference multivariate selection.2 When selecting loose muons and inverting the additional requirements imposed in the tight muon selection, the number of events containing real muons is substantially reduced, while the number of peakingbackground events is approximately two times larger than in the sample obtained with the nominal selection.A fit to data for this background-enhanced sample returns 6.8 ± 3.7 events, which translates into a peaking-background yield in the signal region of 2.9 ± 2.0 events, in good agreement with the simulation.
Besides the peaking background, the tight muon selection also reduces the semileptonic contributions with a single misidentified hadron.Simulation yields 30 ± 3 events expected from B 0 → π − µ + ν and B 0 s → K − µ + ν in the final sample, with a distribution kinematically constrained to be mostly below the signal region.The Λ b → pµ − ν contribution is negligible due to the smaller production cross section and the low rate at which protons fake muons.

Continuum background reduction
A multivariate analysis, implemented as a boosted decision tree (BDT), is employed to enhance the signal relative to the continuum background.This BDT is based on the 15 variables described in Table 1.The discriminating variables can be classified into three groups: (a) B meson variables, related to the reconstruction of the decay vertex and to the collinearity between − → p B and the flight vector between the production and decay vertices − → ∆x; (b) variables describing the muons that form the B meson candidate; and (c) variables related to the rest of the event.The selection of the variables aims to maximise the discrimination power of the classifier without introducing significant dependence on the invariant mass of the muon pair.
The same discriminating variables were used in the previous analysis based on the full Run 1 dataset [15].The removal of individual variables was explored to simplify the BDT input, however, this results inevitably in a significant reduction of the BDT separation power.To minimise the dependence of the classifier on the effects of pile-up, the additional tracks considered to compute the variables I 0.7 , DOCA xtrk and N close xtrk are required to be compatible with the primary vertex matched to the dimuon candidate.
The correlations among the discriminating variables were studied in the MC samples for signal and continuum background discussed in Section 2, and in data from the sidebands of the µ + µ − invariant mass Table 1: Description of the 15 input variables used in a BDT classifier to discriminate between signal and continuum background.When the BDT classifier is applied to B + → J/ψ K + and B 0 s → J/ψ φ candidates, the variables related to the decay products of the B mesons refer only to the muons from the decay of the J/ψ.Horizontal lines separate the classifications into groups (a), (b) and (c) respectively, as described in the text.For category (c), additional tracks are required to have p T >500 MeV.

IP 3D
B Three-dimensional impact parameter of the B candidate to the associated PV.
DOCA µµ Distance of closest approach (DOCA) of the two tracks forming the B candidate (three-dimensional). ∆φ µµ Azimuthal angle between the momenta of the two tracks forming the B candidate.
Significance of the larger absolute value of the impact parameters to the PV of the tracks forming the B candidate, in the transverse plane.
Significance of the smaller absolute value of the impact parameters to the PV of the tracks forming the B candidate, in the transverse plane.The simulated signal sample and the data from the dimuon invariant mass sideband regions are used for training and testing the classifier.As discussed in Section 2, simulated signal samples are corrected for muon reconstruction efficiency differencies between simulation and data, and reweighted according to the distributions of p T and |η| of the dimuon and of the pile-up observed in data.The BDT training is done using the TMVA toolkit [31].Sideband data are used for the BDT training and optimisation.The sample is subdivided into three randomly selected separate and equally populated subsamples used in turn to train and validate the selection efficiency of three independent BDTs.The resulting BDTs are found to produce results that are statistically compatible, and are combined in one single classifier in such a way that each BDT is applied only to the part of the data sample not involved in the BDT training.
Figure 2 shows the distribution of the BDT output variable for simulated signal and backgrounds, separately for continuum background and partially reconstructed events.Also shown is the BDT distribution for dimuon candidates from the sidebands of the invariant mass distribution in data.The BDT output was found to not have any significant correlation with the dimuon invariant mass.The final selection requires a BDT output value larger than 0.1439, corresponding to signal and continuum background efficiencies of 72% and 0.3% respectively.The analysis uses all candidates after this selection; however, accepted events with BDT values close to the selection threshold are effectively only constraining the background models.3For this reason, signal and reference channel yields and efficiencies are measured relative to the signal reference selection discussed in Section 9, while the events in the final selection with lower BDT values are used to improve the background modelling.1).

Data-simulation comparisons
The points correspond to the sideband data, while the continuous-line histogram corresponds to the continuum MC distribution, normalised to the number of data events.The filled-area histogram shows the signal MC distribution for comparison.The bottom insets report the data/MC ratio, zoomed-in in order to highlight discrepancies in the region that is most relevant for the analysis.
The distributions of the discriminating variables are also used to compare simulation and data in the B + → J/ψ K + and B 0 s → J/ψ φ samples.To perform these comparisons, for each variable the contribution of the background is subtracted from the B + → J/ψ K + (B 0 s → J/ψ φ) signal.For this purpose, a maximum-likelihood fit is performed to the invariant mass distribution, separately in bins of rapidity and transverse momentum.The fit model used is simpler than the one employed for the extraction of the B + signal for normalisation as described in Section 8, but is sufficient for the purpose discussed here.
Figure 4 shows examples of the distributions of the discriminating variables obtained from data and simulation for the reference samples.Observed differences are used to estimate systematic uncertainties with the procedure described in Section 9.The discrepancy visible for the isolation variable I 0.7 in the B + → J/ψ K + channel is the most significant among all variables and both reference channels.

B + → J/ψK + yield extraction
The reference channel yield is extracted with an unbinned extended maximum-likelihood fit to the J/ψK + invariant mass distribution.The functional forms used to model both the signal and the backgrounds are obtained from studies of MC samples.All the yields are extracted from the fit to data, while the shape The variable I 0.7 is also shown in (d) for B 0 s → J/ψ φ events.The points correspond to the sideband-subtracted data, while the line corresponds to the MC distribution, normalised to the number of data events.The highest bin in (c) and (d) accounts for the events with I 0.7 = 1.The bottom insets report the data/MC ratio, zoomed-in in order to highlight discrepancies in the region that is most relevant for the analysis.
parameters are determined from a simultaneous fit to data and MC samples.Free parameters are introduced for the mass scale and mass resolution to accommodate data-MC differences.The best-fit values indicate a negligibly poorer resolution and a mass shift at the level of 2 MeV.
The fit includes four components: B + → J/ψ K + decays, Cabibbo-suppressed B + → J/ψ π + decays in the right tail of the main peak, partially reconstructed B decays (PRD) where one or more of the final-state particles are missing, and the non-resonant background composed mostly of b b → J/ψ X decays.All components other than the last one have shapes constrained by MC simulation as described below, with the data fit including an additional Gaussian convolution to account for possible data-MC discrepancies in mass scale and resolution.The shape of the B + → J/ψ K + mass distribution is parameterised using a Johnson The various components of the spectrum are described in the text.The inset at the bottom of the plot shows the bin-by-bin pulls for the fit, where the pull is defined as the difference between the data point and the value obtained from the fit function, divided by the error from the fit.
S U function [32,33].The final B + → J/ψ K + yield includes the contribution from radiative effects (i.e.where photons are emitted from the B decay products).The B + → J/ψ π + decays are modelled by the sum of a Johnson S U function and a Gaussian function, where all parameters except the normalisation are determined from the simulation.The decay modes contributing to the PRD are classified in simulation on the basis of their mass dependence.Each of the three resulting categories contributes to the overall PRD shape with combinations of Fermi-Dirac and exponential functions, contributing differently in the low-mass region.Their shape parameters are determined from simulation.Finally, the non-resonant background is modelled with an exponential function with the shape parameter extracted from the fit.The normalisation of each component is unconstrained in the fit, which is therefore mostly independent of external inputs for the branching fractions.The residual dependence of the PRD model shapes on the relative branching fractions of the contributing decays is considered as a source of systematic uncertainty.The resulting fit, shown in Figure 5, yields 334 351 B + → J/ψ K + decays with a statistical uncertainty of 0.3%.The ratio of yields of B + → J/ψ π + and B + → J/ψ K + is (3.71 ± 0.09)% (where the uncertainty reported is statistical only), in agreement with the expectation from the world average [29] of (3.84 ± 0.16)%.Some systematic uncertainties are included by design in the fit.For example, the effect of the limited MC sample size is included by performing a simultaneous fit to data and MC samples.Scaling factors determined in the fit to data account for the differences in mass scale and resolution between data and simulation.Additional systematic uncertainties are evaluated by varying the default fit model described above.They take into account the kinematic differences between data and the MC samples used in the fit, differences in efficiency between B + and B − decays and uncertainties in the relative fractions and shapes of PRD and in the shape of the various fit components.The stability of this large sample fit is verified by repeating the fit with different initial parameter values.In each case, the change relative to the default fit is recorded, symmetrised and used as an estimate of the systematic uncertainty.The main contributions to the systematic uncertainty come from the functional models of the background components, the composition of the PRD and the signal charge asymmetry.The total systematic uncertainty in the B + yield amounts to 4.8%.9 Evaluation of the B + → J/ψ K + to B 0 (s) → µ + µ − efficiency ratio The ratio of efficiencies The signal reference BDT selection, defined as BDT > 0.2455, has an efficiency of about 54% (51%) in the signal (reference) channel.The overall efficiency ratio R ε is 0.1176 ± 0.0009 (stat.)± 0.0047 (syst.), with uncertainties determined as described below.
The ratio R ε is computed using the mean lifetime of B 0 s [29,34] in the MC generator.The same efficiency ratios apply to the B 0 s → µ + µ − and B 0 → µ + µ − decays, within the MC statistical uncertainty of 0.8%.The statistical uncertainties in the efficiency ratios come from the finite number of events available for the simulated samples.The systematic uncertainty affecting R ε comes from five sources.
The first contribution is due to the uncertainties in the data-driven weights introduced in Section 2, and amounts to 0.8%.This term is assessed by creating alternative datasets using correction factors that are randomly sampled in accord with their nominal values and uncertainties.The RMS value of the distribution of R ε obtained from these datasets is taken as the systematic uncertainty.
A second contribution of 1.0% is related to the muon trigger and reconstruction efficiencies.The effect of the uncertainties in the data-driven efficiencies is evaluated using random sampling, as above.A 3.2% systematic uncertainty contribution arises from the differences between data and simulation observed in the modelling of the discriminating variables used in the BDT classifier (Table 1).For each of the 15 variables, the MC samples for B 0 (s) → µ + µ − and B + → J/ψ K + are reweighted with the ratio of the B + → J/ψ K + event distributions in sideband-subtracted data and the MC simulation.The isolation variable I 0.7 is computed using charged-particle tracks only, and differences between B + and B 0 s are expected and were observed in previous studies [26].Hence for this variable the reweighting procedure for the B 0 s → µ + µ − MC sample is based on B 0 s → J/ψ φ data.For all discriminating variables except I 0.7 , the value of the efficiency ratio is modified by less than 2% by the reweighting procedure and each variation is taken as an independent contribution to the systematic uncertainty in the efficiency ratio.For I 0.7 the reweighting procedure changes the efficiency ratio by about 6%.Because of the significant mis-modelling, the MC samples obtained after reweighting on the distribution of I 0.7 are taken as a reference, thus correcting the central value of the efficiency ratio.The 1% uncertainty in the I 0.7 correction is added to the sum in quadrature of the uncertainties assigned to the other discriminating variables.The total uncertainty in the modelling of the discriminating variables is the dominant contribution to the systematic uncertainty in R ε .
A fourth source of systematic uncertainty arises from differences between the B 0 s → µ + µ − and the B + → J/ψ K + channel related to the reconstruction efficiency of the kaon track and of the B + decay vertex.These uncertainties are mainly due to inaccuracy in the modelling of passive material in the ID.
The corresponding systematic uncertainty is estimated by varying the detector model in simulations, which results in changes between 0.4% and 1.5% depending on the η range considered.The largest value is used in the full eta range.
Finally, the uncertainty associated with reweighting the simulated events as a function of the pile-up multiplicity distribution contributes 0.6%.A correction to the efficiency ratio for B 0 s → µ + µ − is needed because of the width difference ∆Γ s between the B 0 s eigenstates.According to the SM, the decay B 0 s → µ + µ − proceeds mainly through the heavy state B s,H [1,16], which has width Γ s,H = Γ s − ∆Γ s /2, which is 6.6% smaller than the average Γ s [29].The variation in the value of the B 0 s → µ + µ − mean lifetime was tested with simulation and found to change the B 0 s efficiency, and consequently the B 0 s to B + efficiency ratio, by +3.3%.This correction is applied to the central value of D ref used in Section 11 for the determination of B(B 0 s → µ + µ − ).4 Due to the small value of ∆Γ d , no correction needs to be applied to the B 0 → µ + µ − decay.

Extraction of the signal yield
Dimuon candidates passing the preliminary selection and the selections against hadron misidentification and continuum background are classified according to four intervals (with boundaries at 0.1439, 0.2455, 0.3312, 0.4163 and 1) in the BDT output.Repeating the Run 1 analysis approach, each interval is chosen to give an equal efficiency of 18% for signal MC events, and they are ordered according to increasing signal-to-background ratio.
An unbinned extended maximum-likelihood fit is performed on the dimuon invariant mass distribution simultaneously across the four BDT intervals.The first two bins contribute very little to the signal determination and are included for background modelling.They were verified with MC pseudo-experiments to have negligible relevance for the signal extraction.The result of the fit is the total yield of B 0 s → µ + µ − and 4 The decay time distribution of B 0 s → µ + µ − is predicted to be different from the one of B s,H in scenarios of new physics, with the effect related to the observable A µµ ∆Γ [16,17].The maximum possible deviation from the SM prediction of A µµ ∆Γ = +1 is for A µµ ∆Γ = −1, for which the decay time distribution of B 0 s → µ + µ − corresponds to the distribution of the B s,L eigenstate.In the comparison with new-physics predictions, the value of B(B 0 s → µ + µ − ) obtained from this analysis should be corrected by +3.6% or +7.8% respectively for A µµ ∆Γ = 0 and −1.
B 0 → µ + µ − events in the three most sensitive BDT intervals.The parameters describing the background are allowed to vary freely and are determined by the fit.The normalisations of the individual fit components, including the signals, are completely unconstrained and allowed to take negative values.The ratios of the signal yields in different BDT bins are constrained to equal the ratios of the signal efficiencies in those same bins, as discussed in Section 10.1, where the signal and background fit models are described.The systematic uncertainties due to variations in the relative signal and background efficiencies between BDT intervals, to the signal parameterisation and to the background model are discussed in Sections 10.1 and 10.2.Each is modelled in the likelihood as a multiplicative Gaussian distribution whose width is equal to the corresponding systematic uncertainty.

Signal and background model
The signal and background models are derived from simulations and from data collected in the mass sidebands of the search region.
The invariant mass distribution of the B 0 (s) → µ + µ − signal is described with two double-Gaussian distributions, centred respectively at the B 0 or B 0 s mass.The shape parameters are extracted from simulation, where they are found to be uncorrelated with the BDT output.Systematic uncertainties in the mass scale and resolutions are considered separately.Figure 6 shows the invariant mass distributions for B 0 and B 0 s , obtained from MC events and normalised to the SM expectations.Section 9 explains how  systematic uncertainties affect the overall selection efficiency for signal candidates.The separation of the candidates according to BDT bins introduces an additional dependence on the relative efficiencies in each BDT bin, and systematic uncertainties in these relative efficiencies must be accounted for.Two different procedures are explored.First, the distribution of the BDT output is compared between MC simulation and background-subtracted data for the reference and control channels.The differences observed in the ratio of data to simulation are described with a linear dependence on the BDT output.The linear dependencies observed for B + → J/ψ K + and B 0 s → J/ψ φ are in turn used to reweight the BDT-output distribution in the B 0 (s) → µ + µ − MC sample.The maximum corresponding absolute variations in the efficiencies are equal to +1.7% and −2.3% respectively in the second and fourth BDT intervals, with the third interval basically unaffected.A second assessment of the systematic uncertainties in the relative efficiency of the BDT intervals is obtained with a procedure similar to the one used for the event selection (Section 9).For each discriminating variable, the MC sample is reweighted according to the difference between simulation and data observed in the reference channels.The variation in the efficiency of each BDT interval is taken as the contribution to the systematic uncertainty due to mis-modelling of that variable.The sum in quadrature of the variations due to all discriminating variables is found to be similar in the B + → J/ψ K + and B 0 s → J/ψ φ channels.Absolute variations of ±1.0%, ±2.4% and ±4.4% are found in the second, third and fourth BDT intervals respectively.The first of these procedures is used as a baseline for inclusion of Gaussian terms in the signal extraction likelihood to account for the uncertainty in the relative signal efficiency in the three most sensitive BDT bins.Care is taken in constraining the sum of the efficiencies of the three intervals sensitive to the signal, since that absolute efficiency and the corresponding uncertainty is parameterised with the R ε term.
Figure 7 shows the distribution of the BDT output from data and simulation for the reference channels, after reweighting the MC sample.The MC distribution for B 0 (s) → µ + µ − events is also shown, illustrating how the linear deviation obtained from the reference channels affects the simulated signal BDT output.When studying these effects, the linear fits to the ratios in Figures 7(a) and 7(b) are performed in the range corresponding to the three BDT bins with the highest signal-to-background ratio, since the remaining bin is insensitive to the signal contribution.
The background in the signal fit is composed of the types of events described in Section 4: (a) the continuum background; (b) the background from partially reconstructed b → µ + µ − X events, which is present mainly in the low mass sideband; (c) the peaking background.
The non-peaking contributions have a common mass shape model, with parameters constrained across the BDT bins in the fit as described below, and independent yields across BDT bins and components.
Both in simulation and sideband data, the continuum background has a small linear dependence on the dimuon invariant mass.In the simulation, the slope parameter has a roughly linear dependence versus BDT interval; the mass sidebands in data confirm this trend, albeit with large statistical uncertainty.This dependence is included in the fit model.The small systematic uncertainties due to deviations from this assumption are discussed below in Section 10.2.
The b → µ + µ − X background has a dimuon invariant mass distribution that falls monotonically with increasing dimuon mass.The mass dependence is derived from data in the low mass sideband, and described with an exponential function with the same shape in each BDT interval.The value of the shape parameter is extracted from the fit to data.
The invariant mass distribution of the peaking background is very similar to the B 0 signal, as shown in Figure 1(b).The description of this component is obtained from MC simulation, which indicates that the shape and normalisation are the same for all BDT bins.In the fit, this contribution is included with fixed mass shape and with a normalisation of 2.9 ± 2.0 events, as discussed in Section 5.This contribution is equally distributed among the three highest intervals of the BDT output.
The fitting procedure is tested with MC pseudo-experiments, as discussed below.

Systematic uncertainties in the fit
Studies based on MC pseudo-experiments are used to assess the sensitivity of the fit to the input assumptions.Variations in the description of signal and background components are used in the generation of these samples.The corresponding changes in the average numbers, N s and N d , of B 0 s and B 0 events determined by the fit, run in the nominal configuration, are taken as systematic uncertainties.The size of the variations used in the generation of the MC pseudo-experiments is determined in some cases by known characteristics of the ATLAS detector (reconstructed momentum scale and momentum resolution), in others using MC evaluation (background due to semileptonic three-body B 0 (s) decays and to B c → J/ψ µ + ), and in others from uncertainties determined from data in the sidebands or from simulation (shapes of the background components and their variation across the BDT intervals).
The MC pseudo-experiments were generated with the normalisation of the continuum and b → µµX components obtained from the fit to the data in the sidebands of the invariant mass distribution, and the peaking background from the expectation discussed in Section 5.The signal was generated with different configurations, roughly covering the range between zero and twice the expected SM yield.
For all variations of the assumptions and all configurations of the signal amplitudes the distributions of the differences between fit results and generated values are used to evaluate systematic uncertainties.In addition, distributions obtained from MC pseudo-experiments generated and fitted according to the nominal fit model are used to study systematic biases deriving from the fit procedure.For both signal yields, the bias is smaller than 15% of the fit error, for true values of the B 0 s → µ + µ − branching fraction above 5 × 10 −10 .
The shifts in N s or N d are combined by considering separately the sums in quadrature of the positive and negative shifts and taking the larger as the symmetric systematic uncertainty.The total systematic uncertainty is found to increase with the assumed size of the signal, with a dependence σ N s syst = 3 + 0.05N s and σ N d syst = 2.9 + 0.05N s + 0.05N d .Most of the shifts observed have opposite sign for N s and N d , resulting in a combined correlation coefficient in the systematic uncertainties of ρ syst = −0.83.
The systematic uncertainties discussed in this Section are included in the fit to the µ + µ − candidates in data.The fit for the yield of B 0 s and B 0 events is modified by including in the likelihood two smearing parameters for N s and N d that are constrained by a two-dimensional Gaussian distribution parameterised by the values of σ N s syst , σ N d syst and ρ syst .

Results of the signal yield extraction
The numbers of background events contained in the signal region (5166-5526 MeV) are computed from the interpolation of the data observed in the sidebands.This procedure yields 2685 ± 37, 330 ± 14, 51 ± 6 and 7.9 ± 2.6 events respectively in the four intervals of BDT output.For comparison, the total expected numbers of signal events according to the SM prediction are 91 and 10 for N s and N d respectively, equally distributed among the three intervals with the highest signal-to-background ratio.
In those three BDT intervals, in the unblinded signal region, a total of 1951 events in the full mass range of 4766-5966 MeV are used in the likelihood fit to signal and background.Without applying any bounds on the values of the fitted parameters, the values determined by the fit are N s = 80 ± 22 and N d = −12 ± 20, where the uncertainties correspond to likelihood variations satisfying −2 ∆ ln(L) = 1.The likelihood includes the systematic uncertainties discussed above, but statistical uncertainties largely dominate.The result is consistent with the expectation from simulation.The uncertainties in the result of the fit are discussed in Section 11, where the measured values of the branching fractions are presented.
Figure 8 shows the dimuon invariant mass distributions in the four BDT intervals, together with the projections of the likelihood.A modified Kolmogorov-Smirnov (KS) test [35] is used to estimate the fit quality: the p-value is estimated by comparing the maximum of the KS distance across the four histograms of Figure 8 with the distribution of the same quantity from pseudo-experiments generated with the shape resulting from the fit to data.This procedure yields a compatibility probability of 84%.

Branching fraction extraction
The branching fractions for the decays B 0 s → µ + µ − and B 0 → µ + µ − are extracted from data using a maximum-likelihood fit.The likelihood is obtained from the one used for N s and N d by replacing the fit parameters with the corresponding branching fractions divided by normalisation terms in Eq. ( 1), and including Gaussian multiplicative factors for the normalisation uncertainties.All results are obtained profiling the fit likelihood with respect to all parameters involved other than the branching fraction(s) of interest.
The normalisation terms include external inputs for the B + branching fraction and the relative hadronisation probability.The branching fraction is obtained from world averages [29] as the product of B(B + → Table 3: Breakdown of the expected systematic uncertainties in B(B 0 (s) → µ + µ − ).The measurements are dominated by statistical uncertainty, followed by the systematic uncertainty from the fit.The latter is dominated by contributions from the mass scale uncertainty and the parameterisation of the b → µ + µ − X background.The statistical uncertainties reported here are obtained from the maximisation of the fit likelihood and are meant only as a reference for the relative scale uncertainties.
Table 3 gives a breakdown of the estimated contributions of systematic and statistical uncertainties.The results are dominated by statistical uncertainties, with the most prominent source of systematic uncertainty coming from fit uncertainties, where the largest contributors are the mass scale and b → µ + µ − X background parameterisation.
Given the statistical regime of the analysis, the likelihood contours of Figure 9(a) cannot be immediately translated into contours with the conventional coverage of one, two and three Gaussian standard deviations.Moreover, the contours extend into regions of negative branching fractions, which are unphysical.In order to address these points, a Neyman construction [36] is employed to obtain the 68.3%, 95.5% and 99.7% confidence intervals in the B(B 0 s → µ + µ − ) -B(B 0 → µ + µ − ) plane.This construction yields the contours shown in Figure 9 branching fractions (obtained from the unconstrained likelihood maximum) are in all cases inputs to the Neyman construction, which, by design, results in physically allowed values for the resulting branching fractions.

Combination with the Run 1 result
The likelihood function from the current result is combined with the likelihood function from the Run 1 result [15].The only common parameters in the combination are the fitted B(B 0 (s) → µ + µ − ) and the combination of external inputs Except for F ext , all nuisance parameters are treated as uncorrelated between the two likelihoods, with both likelihoods including their individual parameterisations of systematic uncertainty effects.A negligible change in the results, corresponding to shifts in central values and uncertainties between 1% and 4%, is found when all sources of systematic uncertainty are assumed to be fully correlated.
The maximum of the combined likelihood corresponds to When applying the one-dimensional Neyman construction described in Section 11 to this combined likelihood, whose maximum is unconstrained and allowed to access the unphysical (negative) region, the 68.3% confidence interval obtained for B(B The upper limit at 95% CL on B(B 0 → µ + µ − ) is determined with the same Neyman procedure, yielding Using the predicted SM branching fractions from Section 1, the analysis is expected to yield on average a measurement of 3.6 +0.9 −0.8 × 10 −9 for B(B 0 s → µ + µ − ) and an upper limit of 5.6 × 10 −10 for B(B 0 → µ + µ − ).
The Run 1 and Run 2 results are found to be 1.2 standard deviations apart.Using both runs, the combined significance of the B 0 s → µ + µ − signal is estimated to be 4.6 standard deviations, and the combined branching fraction measurements differ by 2.4 standard deviations from the SM values in the B(B 0 → µ + µ − )-B(B 0 s → µ + µ − ) plane.These significancies are assessed purely from the evaluation of likelihood ratios.

Conclusions
A study of the rare decays of B 0 s and B 0 mesons into oppositely charged muon pairs is presented, based on 36.2 fb −1 of 13 TeV LHC proton-proton collision data collected by the ATLAS experiment in 2015 and 2016.
For the B 0 s the branching fraction is determined to be B(B 0 s → µ + µ − ) = 3.2 +1.1 −1.0 × 10 −9 , where the uncertainty includes both the statistical and systematic contributions.The result is consistent with the analysis expectation of 3.6 +1.1 −1.0 × 10 −9 in the SM hypothesis.For the B 0 an upper limit B(B 0 → µ + µ − ) < 4.3 × 10 −10 is placed at the 95% confidence level, with an expected upper bound of 7.1 × 10 −10 in the SM hypothesis.The limit is compatible with the SM prediction.
The result presented in this paper is combined with the ATLAS result from the full Run 1 dataset to obtain B(B 0 s → µ + µ − ) = 2.8 +0.8 −0.7 × 10 −9 and B(B 0 → µ + µ − ) < 2.1 × 10 −10 .All the results presented are compatible with the branching fractions predicted by the SM as well as with currently available experimental results.ERC, ERDF, Horizon 2020, and Marie Skłodowska-Curie Actions, European Union; Investissements d' Avenir Labex and Idex, ANR, France; DFG and AvH Foundation, Germany; Herakleitos, Thales and Aristeia programmes co-financed by EU-ESF and the Greek NSRF, Greece; BSF-NSF and GIF, Israel; CERCA Programme Generalitat de Catalunya, Spain; The Royal Society and Leverhulme Trust, United Kingdom.
[12] LHCb Collaboration, Measurement of the B 0 s → µ + µ − Branching Fraction and Search for B 0 → µ + µ − Decays at the LHCb Experiment, Phys.Rev. Lett. 111 (2013)  u Also at Giresun University, Faculty of Engineering, Giresun; Turkey.v Also at Graduate School of Science, Osaka University, Osaka; Japan.w Also at Hellenic Open University, Patras; Greece.
x Also at Horia Hulubei National Institute of Physics and Nuclear Engineering, Bucharest; Romania.y Also at II.Physikalisches Institut, Georg-August-Universität Göttingen, Göttingen; Germany.

Figure 1 :
Figure 1: (a) Dimuon invariant mass distribution for the partially reconstructed background (as categorised in Section 4), from simulation, before the final selection against continuum is applied but after all other requirements.The different components are shown as stacked histograms, normalised according to world-averaged measured branching fractions.The SM expectations for the B 0 (s) → µ + µ − signals are also shown for comparison.Continuum background is not included here.(b) Invariant mass distribution of the B 0 (s) → hh peaking background components after the complete signal selection is applied.The B 0 s → π + π − and B 0 → K + K − contributions are negligible on this scale.In both plots the vertical dashed lines indicate the blinded analysis region.Distributions are normalised to the expected yield for the integrated luminosity of 26.3 fb −1 .
the B candidate transverse momentum − → p T B .χ 2 PV,DV xy Compatibility of the separation − → ∆x between production (i.e.associated PV) and decay (DV) vertices in the transverse projection: P min L The smaller of the projected values of the muon momenta along − → p T B .I 0.7 Isolation variable defined as ratio of | − → p T B | to the sum of | − → p T B | and the transverse momenta of all additional tracks contained within a cone of size ∆R = (∆φ) 2 + (∆η) 2 = 0.7 around the B direction.Only tracks matched to the same PV as the B candidate are included in the sum.DOCA xtrk DOCA of the closest additional track to the decay vertex of the B candidate.Only tracks matched to the same PV as the B candidate are considered.N close xtrk Number of additional tracks compatible with the decay vertex (DV) of the B candidate with ln( χ 2 xtrk,DV ) < 1.Only tracks matched to the same PV as the B candidate are considered.χ 2 µ,xPV Minimum χ 2 for the compatibility of a muon in the B candidate with any PV reconstructed in the event.distribution.There are significant linear correlations among the variables χ 2 PV,DV xy , L xy , |d 0 | max -sig., |d 0 | min -sig.and χ 2 µ,xPV .The variables IP 3D B , DOCA µµ and I 0.7 have negligible correlation with any of the others used in the classifier.

Figure 2 :
Figure 2: BDT output distribution for the signal and background events after the preliminary selection and before applying any reweighting to the BDT input variables: (a) simulation distributions for B 0 s → µ + µ − signal, continuum, partially reconstructed b → µ + µ − X events and B c decays; (b) dimuon sideband candidates (which also include prompt contributions, mainly at lower BDT values and not simulated in the continuum MC sample), compared with the continuum MC sample and the simulated signal.All distributions are normalised to unity in (a) and to data sidebands in (b).

Figure 3 Figure 3 :
Figure 3 compares the distributions of two discriminating variables in the continuum background MC sample with data in the dimuon mass sidebands.Agreement with the sideband data is fair and the

Figure 4 :
Figure 4: Data and MC distributions in B + → J/ψ K + events for the discriminating variables: (a) |α 2D |, (b) ln χ 2PV,DV xy and (c) I 0.7 .The variable I 0.7 is also shown in (d) for B 0 s → J/ψ φ events.The points correspond to the sideband-subtracted data, while the line corresponds to the MC distribution, normalised to the number of data events.The highest bin in (c) and (d) accounts for the events with I 0.7 = 1.The bottom insets report the data/MC ratio, zoomed-in in order to highlight discrepancies in the region that is most relevant for the analysis.

Figure 5 :
Figure5: Result of the fit to the J/ψK + invariant mass distribution for all B + candidates in half of the data events.The various components of the spectrum are described in the text.The inset at the bottom of the plot shows the bin-by-bin pulls for the fit, where the pull is defined as the difference between the data point and the value obtained from the fit function, divided by the error from the fit.
enters the D ref term defined in Section 1: D ref = N J/ψK + /R ε .Both channels are measured in the fiducial acceptance for the B meson, defined as p B T > 8.0 GeV and |η B | < 2.5.Correspondingly, ε(B + → J/ψ K + ) and ε(B 0 (s) → µ + µ − ) are measured within the B meson fiducial acceptance and include additional final state particles acceptance as well as trigger, reconstruction and selection efficiencies.The final state particles acceptance is defined by the selection placed on the particles in the final state: |η µ | < 2.5 and p µ T > 6.0 (4.0) GeV for the leading (trailing) muon p T , p K T > 1.0 GeV and |η K | < 2.5 for kaons.

Figure 6 :
Figure 6: Dimuon invariant mass distribution for the B 0 s and B 0 signals from simulation.The results of the double-Gaussian fits are overlaid.The two distributions are normalised to the SM prediction for the expected yield with an integrated luminosity of 26.3 fb −1 .

Figure 7 :
Figure 7: BDT value distributions in data and MC simulation for (a) B + → J/ψ K + , (b) B 0 s → J/ψ φ.The MC samples are normalised to the number of data events passing the signal reference BDT selection (Section 6). Figure (c) illustrates the BDT output for the B 0 s → µ + µ − signal, with the dashed histogram illustrating the effect of the linear reweighting on the BDT output discussed in the text.The vertical dashed lines correspond to the boundaries of the BDT intervals used in the B 0 (s) → µ + µ − signal fit.

Figure 8 :
Figure 8: Dimuon invariant mass distributions in the unblinded data, in the four intervals of BDT output.Superimposed is the result of the maximum-likelihood fit.The total fit is shown as a continuous line, with the dashed lines corresponding to the observed signal component, the b → µµX background, and the continuum background.The signal components are grouped in one single curve, including both the B 0 s → µ + µ − and the (negative) B 0 → µ + µ − component.The curve representing the peaking B 0 (s) → hh background lies very close to the horizontal axis in all BDT bins.

Figure 10 Figure 10 :
Figure 10 shows the likelihood contours for the combined Run 1 and Run 2 result for B(B 0 s → µ + µ − ) and B(B 0 → µ + µ − ), for values of −2∆ ln (L) equal to 2.3, 6.2 and 11.8, relative to the maximum of the likelihood.The contours for the result from 2015-2016 Run 2 data are overlaid for comparison.

Table 2
summarises these systematic uncertainties.

Table 2 :
Summary of the uncertainties in R ε .
The efficiency ratio enters in Eq. (1) with the D ref term defined in Section 1, multiplied by the number of observed B ± candidates.The total uncertainty in D ref is ±6.3%.
101805, arXiv: 1307.5024[hep-ex].[13] CMS and LHCb Collaborations, Observation of the rare B 0 s → µ + µ − decay from the combined analysis of CMS and LHCb data, Nature 522 (2015) 68, arXiv: 1411.4413[hep-ex].[14] LHCb Collaboration, Measurement of the B 0 s → µ + µ − Branching Fraction and Effective Lifetime and Search for B 0 → µ + µ − Decays, Phys.Rev. Lett.118 (2017) 191801, arXiv: 1703.05747[hep-ex].[15] ATLAS Collaboration, Study of the rare decays of B 0 s and B 0 into muon pairs from data collected during the LHC Run 1 with the ATLAS detector, Eur.Phys.J. C 76 (2016) 513, arXiv: 1604.04263[hep-ex].Also at California State University, East Bay; United States of America.c Also at Centre for High Performance Computing, CSIR Campus, Rosebank, Cape Town; South Africa.d Also at CERN, Geneva; Switzerland.e Also at CPPM, Aix-Marseille Université, CNRS/IN2P3, Marseille; France.f Also at Département de Physique Nucléaire et Corpusculaire, Université de Genève, Genève; Switzerland.g Also at Departament de Fisica de la Universitat Autonoma de Barcelona, Barcelona; Spain.h Also at Departamento de Física Teorica y del Cosmos, Universidad de Granada, Granada (Spain); Spain.i Also at Departamento de Física, Instituto Superior Técnico, Universidade de Lisboa, Lisboa; Portugal.j Also at Department of Applied Physics and Astronomy, University of Sharjah, Sharjah; United Arab Emirates.k Also at Department of Financial and Management Engineering, University of the Aegean, Chios; Greece.l Also at Department of Physics and Astronomy, University of Louisville, Louisville, KY; United States of America.m Also at Department of Physics and Astronomy, University of Sheffield, Sheffield; United Kingdom.n Also at Department of Physics, California State University, Fresno CA; United States of America.o Also at Department of Physics, California State University, Sacramento CA; United States of America.p Also at Department of Physics, King's College London, London; United Kingdom.q Also at Department of Physics, St. Petersburg State Polytechnical University, St. Petersburg; Russia.r Also at Department of Physics, Stanford University; United States of America.s Also at Department of Physics, University of Fribourg, Fribourg; Switzerland.t Also at Department of Physics, University of Michigan, Ann Arbor MI; United States of America. b

z
Also at Institucio Catalana de Recerca i Estudis Avancats, ICREA, Barcelona; Spain.aa Also at Institut für Experimentalphysik, Universität Hamburg, Hamburg; Germany.ab Also at Institute for Mathematics, Astrophysics and Particle Physics, Radboud University Nijmegen/Nikhef, Nijmegen; Netherlands.ac Also at Institute for Particle and Nuclear Physics, Wigner Research Centre for Physics, Budapest; Hungary.ad Also at Institute of Particle Physics (IPP); Canada.ae Also at Institute of Physics, Academia Sinica, Taipei; Taiwan.a f Also at Institute of Physics, Azerbaijan Academy of Sciences, Baku; Azerbaijan.ag Also at Institute of Theoretical Physics, Ilia State University, Tbilisi; Georgia.ah Also at Instituto de Física Teórica de la Universidad Autónoma de Madrid; Spain.ai Also at Istanbul University, Dept. of Physics, Istanbul; Turkey.a j Also at Joint Institute for Nuclear Research, Dubna; Russia.Also at LAL, Université Paris-Sud, CNRS/IN2P3, Université Paris-Saclay, Orsay; France.al Also at Louisiana Tech University, Ruston LA; United States of America.am Also at LPNHE, Sorbonne Université, Paris Diderot Sorbonne Paris Cité, CNRS/IN2P3, Paris; France.an Also at Manhattan College, New York NY; United States of America.ao Also at Moscow Institute of Physics and Technology State University, Dolgoprudny; Russia.ap Also at National Research Nuclear University MEPhI, Moscow; Russia.aq Also at Physikalisches Institut, Albert-Ludwigs-Universität Freiburg, Freiburg; Germany.ar Also at School of Physics, Sun Yat-sen University, Guangzhou; China.as Also at The City College of New York, New York NY; United States of America.at Also at The Collaborative Innovation Center of Quantum Matter (CICQM), Beijing; China.au Also at Tomsk State University, Tomsk, and Moscow Institute of Physics and Technology State University, Dolgoprudny; Russia.av Also at TRIUMF, Vancouver BC; Canada.aw Also at Universita di Napoli Parthenope, Napoli; Italy.