Measurement of properties of Bs0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\mathrm{B}}_{\mathrm{s}}^0 $$\end{document}→ μ+μ− decays and search for B0→ μ+μ− with the CMS experiment

Results are reported for the Bs0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\mathrm{B}}_{\mathrm{s}}^0 $$\end{document}→ μ+μ− branching fraction and effective lifetime and from a search for the decay B0→ μ+μ−. The analysis uses a data sample of proton-proton collisions accumulated by the CMS experiment in 2011, 2012, and 2016, with center-of-mass energies (integrated luminosities) of 7 TeV (5 fb−1), 8 TeV (20 fb−1), and 13 TeV (36 fb−1). The branching fractions are determined by measuring event yields relative to B+→ J/ψK+ decays (with J/ψ → μ+μ−), which results in the reduction of many of the systematic uncertainties. The decay Bs0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\mathrm{B}}_{\mathrm{s}}^0 $$\end{document}→ μ+μ− is observed with a significance of 5.6 standard deviations. The branching fraction is measured to be ℬBs0→μ+μ−=2.9±0.7exp±0.2frag×10−9\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ \mathrm{\mathcal{B}}\left({\mathrm{B}}_{\mathrm{s}}^0\to {\upmu}^{+}{\upmu}^{-}\right)=\left[2.9\pm 0.7\left(\exp \right)\pm 0.2\left(\mathrm{frag}\right)\right]\times {10}^{-9} $$\end{document}, where the first uncertainty combines the experimental statistical and systematic contributions, and the second is due to the uncertainty in the ratio of the Bs0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\mathrm{B}}_{\mathrm{s}}^0 $$\end{document} and the B+ fragmentation functions. No significant excess is observed for the decay B0→ μ+μ−, and an upper limit of ℬ(B0 → μ+μ−) < 3.6 × 10−10 is obtained at 95% confidence level. The Bs0\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\mathrm{B}}_{\mathrm{s}}^0 $$\end{document}→ μ+μ− effective lifetime is measured to be τμ+μ−=1.70−0.44+0.61\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\tau}_{\upmu^{+}{\upmu}^{-}}={1.70}_{-0.44}^{+0.61} $$\end{document} ps. These results are consistent with standard model predictions.


Introduction
Leptonic B meson decays offer excellent opportunities to perform precision tests of the standard model (SM) of particle physics because of minimal hadronic uncertainties in the theoretical predictions [1][2][3][4][5]. In the SM, the decays B 0 s → µ + µ − and B 0 → µ + µ − proceed only via loop diagrams and are also helicity suppressed, leading to very small expected decay time-integrated branching fractions, (3.66 ± 0.14) × 10 −9 and (1.03 ± 0.05) × 10 −10 , respectively. Theoretical uncertainties in the calculation of these branching fractions have been reduced in recent years as a result of progress in lattice quantum chromodynamics (QCD) [6][7][8][9][10], in the calculation of electroweak effects at next-to-leading order [2], and in the calculation of QCD effects at next-to-next-to-leading order [3]. Enhanced electromagnetic contributions from virtual photon exchange have been shown [4,5] to produce larger corrections than previously assumed in the theoretical uncertainties. The B 0 s → µ + µ − branching fraction has been measured in proton-proton (pp) collisions by the CMS, LHCb, and ATLAS Collaborations [11][12][13][14]. For the decay B 0 → µ + µ − , evidence at the three standard deviation level has been obtained by the CMS and LHCb Collaborations in a combined analysis [12] of √ s = 7 and 8 TeV data. However, this has not been confirmed by LHCb after incorporating 13 TeV data [13], nor by ATLAS [14].
In this paper, we report updated results for the B 0 s → µ + µ − and B 0 → µ + µ − branching fractions, as well as a measurement of the B 0 s → µ + µ − effective lifetime. The data were collected in pp collisions at the CERN LHC, and correspond to integrated luminosities of 5 and 20 fb −1 recorded in 2011 and 2012 at √ s = 7 and 8 TeV, respectively, during Run 1 of the LHC, and 36 fb −1 recorded in 2016 at √ s = 13 TeV, during Run 2. Depending on the context, the symbol B is used to denote the B 0 , B 0 s , and B + mesons and/or the Λ b baryon, and charge conjugation is implied throughout, except as noted. The present branching fraction measurements supersede the previous CMS results [11], which used Run 1 data only. The main differences with respect to the previous branching fraction results include the greater statistical precision of the larger data sample, an improved muon identification algorithm, based on a newly developed boosted decision tree (BDT), and better constraints against background contamination in the search for B 0 → µ + µ − . In addition to the BDT used for muon identification, the analysis employs a second BDT in the candidate selection. For clarity we will refer to them as the muon BDT and the analysis BDT. The Run 1 data are reanalyzed using the new muon identification algorithm (with its improved BDT), but the candidate selection incorporates the same analysis BDT as used in the original Run 1 analysis, as described in ref. [11]. The binning of the analysis BDT discriminator distribution, used for the final result extraction, was modified, based on the best expected performance. For the Run 2 data, a new analysis BDT was developed.

JHEP04(2020)188
The signal sample consists of B candidates constructed from two oppositely charged muons, which are constrained to originate from a common origin and have an invariant mass in the range 4.8 < m µ + µ − < 6.0 GeV. Within the signal sample, a signal region defined by 5.20 < m µ + µ − < 5. 45 GeV is analyzed only after all analysis procedures have been finalized. The background is estimated from mass sidebands in data and from Monte Carlo (MC) simulation for specific background sources from B decays. The main background categories are (1) combinatorial background with two genuine muons from semileptonic decays of separate B hadrons (e.g., B 0 → D * − µ + ν), (2) rare B decays with two muons (e.g., from B → hµµ where h ∈ {π, K, p}), and (3) rare B decays with one hadron (e.g., from B → hµν) or two hadrons (e.g., from B → hh ( ) ) misidentified as muons. The combinatorial background affects both B 0 s → µ + µ − and B 0 → µ + µ − and is the limiting factor for the measurement of the former. The search for the decay B 0 → µ + µ − , with its smaller expected branching fraction and an expected signal-to-background ratio significantly below one, is additionally affected by rare B decays, since background from hadronic B decays produces a dimuon invariant mass distribution that peaks underneath the B 0 → µ + µ − signal. The background from rare B decays has only a minor impact on the B 0 s → µ + µ − results. Because the mass resolution of the CMS detector has a strong dependence on the pseudorapidity η of the muons, the analysis sensitivity benefits from a division of the data sets into channels based on the pseudorapidity η f µ of the most forward muon of the B candidate, where |η f µ | = max(|η µ + |, |η µ − |). A central and a forward channel are defined for all running periods, with different boundaries for Run 1 and Run 2 because of changing trigger requirements.
A normalization sample based on B + → J/ψK + decays (with J/ψ → µ + µ − ) is used in the measurement of the branching fractions. In addition, a control sample based on B 0 s → J/ψφ decays (with J/ψ → µ + µ − and φ → K + K − ) is used to study differences between B + and B 0 s characteristics (fragmentation, isolation, selection efficiency, etc.) in data and to compare with MC simulation. These samples are reconstructed by adding one or two charged tracks with a kaon mass hypothesis to two oppositely charged muons, requiring the dimuon pair to be consistent with J/ψ meson decay.
The B 0 s → µ + µ − branching fraction is determined using [16], and f u /f s is the ratio of the B + and B 0 s fragmentation functions. The value f s /f u = 0.252 ± 0.012 (exp) ± 0.015 (CMS), a combination [16] with input from measurements by the LHCb [19] and ATLAS Collaborations [20], is used. Beyond the experimental uncertainty from ref. [16], we assign an additional uncertainty (labeled CMS) by adding in quadrature uncertainties evaluated from the consideration of two other issues. First, we derive an uncertainty of 0.008 from the difference -3 -

JHEP04(2020)188
between the value of f s /f u in ref. [16], obtained at √ s = 7 TeV, and that in ref. [21], obtained at √ s = 13 TeV. Second, using the parametrization of the transverse momentum (p T ) dependence in ref. [21], we determine a difference of 0.013 between the f s /f u values at the average p T of ref. [21] and the average p T of the B 0 s → µ + µ − candidates in this analysis (see below, table 3). This also covers the substantially smaller p T dependence of ref. [22]. An analogous equation to eq. (1.3) is used to determine the B 0 → µ + µ − branching fraction, where we assume f d /f u = 1 for the ratio of the B 0 to the B + fragmentation functions [16].
The measurement of B(B 0 s → µ + µ − ) and B(B 0 → µ + µ − ) is performed with an extended unbinned maximum likelihood (UML) fit, with probability density functions (PDFs) obtained from simulated event samples and data sidebands. For the determination of the B 0 s → µ + µ − effective lifetime τ µ + µ − , two independent procedures are used. The first is based on a two-dimensional (2D) UML fit to the invariant mass and proper decay time distributions of B 0 s → µ + µ − candidates. The second is based on a one-dimensional (1D) binned maximum likelihood (ML) fit to the background-subtracted proper decay time distribution obtained with the sPlot [23] method.
The presence of multiple pp interactions in an event is referred to as pileup, whose rate is dependent on the instantaneous luminosity. The average number of reconstructed pp interaction vertices is 8, 15, and 18 for the data collected in 2011, 2012, and 2016, respectively.

Event simulation
Simulated event samples, produced with MC programs, are used to optimize the analysis selection requirements and to determine efficiencies for the signal, normalization, and control samples. The background shapes of the dimuon invariant mass distribution for rare B decays, where one or two charged hadrons are misidentified as muons, are also obtained from simulated event samples. The simulated background decay modes include B → hµν, B → hµµ, and B → hh ( ) . For the decay Λ b → pµ − ν µ , the model of ref. [24], based on QCD light-cone sum rules, is used. In addition, the decay B + c → J/ψµ + ν was studied, but is not required for an adequate background description after the full selection has been applied and therefore is not considered in the final analysis.
The simulated event samples are generated with pythia 6.426 [25] for the Run 1 analysis and pythia 8.212 [26] for the Run 2 analysis. In both cases, signal and background events are selected from generic 2 → 2 QCD processes to provide a complete mixture of gluon fusion, gluon splitting, and flavor excitation production. The analysis efficiency varies for these production mechanisms. For instance, in gluon splitting the two b quarks can have a small phase space separation such that a B 0 s → µ + µ − decay is less isolated than in gluon fusion, for which the two b quarks are, to first order, back-to-back in the transverse plane. The mixture in the MC simulation is compared to the mixture in data using the normalization and control samples, and a corresponding systematic uncertainty is assigned. The changing pileup conditions are reflected in the event simulation.
The decay of unstable particles is described using the evtgen [27] program and finalstate photon radiation using the photos [28,29] program. The detector response is simulated with Geant4 [30].

The CMS detector
The CMS experiment is based on a general purpose detector designed and built to study physics at the TeV scale. A detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in ref. [31]. For this analysis, the main subdetectors used are the silicon tracker, composed of pixel and microstrip detectors within a 3.8 T axial magnetic field, and the muon detector, described below. These detectors are divided into a barrel and two endcap sections.
The silicon tracker detects charged particles within |η| < 2.5. The pixel detector is composed of three layers in the barrel region and two disks located on each side in the forward regions. In total, the pixel detector contains about 66 million 100 µm × 150 µm pixels. Further from the interaction region is a microstrip detector, composed of ten barrel layers, and three inner and nine outer disks on either end of the detector, with strip pitches between 80 and 180 µm. In total, the microstrip detector contains around 10 million strips and, together with the pixel detector, yields an impact parameter resolution of about 15 µm. As a consequence of the high granularity of the silicon tracker and the strong and homogeneous magnetic field, a transverse momentum resolution of about 1.5% [32] is obtained for the muons in this analysis. The systematic uncertainty in the track reconstruction efficiency for charged hadrons is estimated to be 4.0 (2.3)% in Run 1 (Run 2) [32,33]. In Run 2, the microstrip detector experienced operational instabilities during the initial period of the run, resulting in a significant impact on the trigger efficiency as the level of pileup increased. The Run 2 data are therefore divided into two separate running periods, denoted 2016A and 2016B, of roughly equal integrated luminosity. Separate sets of MC samples are used to describe these periods. Residual differences between the MC simulation and the Run 2 data lead to a systematic uncertainty of ±0.07 ps in the effective lifetime measurement. This uncertainty is estimated from the variation in the B + lifetime measured with the B + normalization sample in the two running periods and channels. The maximum difference with respect to the result in ref. [16] is assigned as the systematic uncertainty. The uncertainties due to the residual misalignment of the tracker have negligible impact on the branching fraction measurement. For the effective lifetime measurement, a systematic uncertainty of 0.02 ps is determined from this source, using the normalization sample.
Muons are measured within |η| < 2.4 with four muon stations interspersed among the layers of the steel flux-return plates. Each station consists of several layers of drift tubes and cathode strip chambers in the regions |η| < 1.2 and 0.9 < |η| < 2.4, respectively. They are complemented by resistive plate chambers (RPC) covering the range |η| < 1.6. The muon system does not contribute to the p T measurement of the muons relevant for this analysis and is used exclusively for trigger and muon identification purposes. Standalone muons are reconstructed from hits in the three muon subdetectors. They are subsequently combined with tracks found in the silicon tracker to form global muons [34,35]. For a global muon, a standalone muon is linked to a track by comparing their parameters after propagation to a common surface at the innermost muon station of the reconstructed standalone muon track. Events with dimuon candidates are selected using a two-tiered trigger system [36]. The first level (L1), composed of custom hardware processors, uses information from the muon detectors and calorimeters to select events at a rate of around 100 kHz within a time interval of less than 4 µs. The second level, known as the high-level trigger (HLT), consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to around 1 kHz before data storage.
The L1 trigger requires two muon candidates with either no p T requirement or a loose requirement of p T > 3 GeV, depending on the running period. However, there is an implicit p T threshold of about 3.5 GeV in the barrel and 2 GeV in the endcaps since muons must reach the muon detectors. In Run 1, no restriction was imposed on the muon pseudorapidity, while in Run 2 both muons were required to be within |η| < 1.6. As the instantaneous luminosity increased over the course of Run 1, the trigger selection was gradually tightened, an effect that is accounted for in the simulation of the trigger performance. In Run 2, the trigger conditions were stable for the entire running period. For both running periods, the offline analysis selection is more restrictive than the trigger requirements.
At the HLT level, the complete silicon tracker information is available, providing precise muon momentum information for the dimuon invariant mass calculation and vertex fit. This allows more stringent requirements to be placed on the single muon and dimuon p T , and permits the calculation of the three-dimensional (3D) distance of closest approach (d ca ) between the two muons.
For the normalization (B + → J/ψK + ) and control (B 0 s → J/ψφ) samples, the data in Run 1 were collected by requiring the following: two muons, each with p Tµ > 4 GeV and |η| < 2.2, p T µ + µ − > 6.9 GeV, 2.9 < m µ + µ − < 3.3 GeV, d ca < 0.5 cm, and P(χ 2 /dof) > 15%. To reduce the rate of prompt J/ψ candidates, two additional requirements were imposed in the transverse plane: (i) the pointing angle α xy between the dimuon momentum and the vector from the beamspot (defined as the average interaction point) to the dimuon vertex must fulfill cos α xy > 0.9; and (ii) the flight distance significance xy /σ( xy ) must be larger than 3, where xy is the 2D distance between the beamspot and the dimuon vertex and σ( xy ) is its uncertainty. For Run 2, the only changes were to restrict the muons to |η| < 1.4 and to loosen the vertex probability requirement to P(χ 2 /dof) > 10%. In addition, for Run 2, this trigger path was prescaled by a factor between 1 and 8, depending on the instantaneous luminosity.

JHEP04(2020)188
The trigger efficiencies for the various samples are determined from the MC simulation. They are calculated after all muon identification selection criteria, discussed in section 5, and analysis preselection criteria, discussed in section 6, have been applied. For the signal events, the average trigger efficiency is around 70% (up to 75% in the central channel and down to 65% in the forward channel). The trigger efficiency for the normalization and control samples varies from 75% in the central channel to 50% in the forward channel. A systematic uncertainty of 3% in the trigger efficiency ratio between the signal and normalization samples is estimated from simulation by varying the selection efficiency between a very loose preselection level to a level such that only 10% of the preselected events remain.
The operational instabilities of the microstrip detector during the 2016A running period increased the pileup dependence of the HLT. This affected the normalization sample more strongly than the signal sample, because of the requirement for the former sample that the dimuon vertex be well separated from the beamspot. The pileup-dependent normalization deficit is corrected in data with per-event weights that depend on the number of reconstructed pp interaction vertices and xy . The systematic uncertainty associated with this correction amounts to 6% for the 2016A data and 5% for 2016B.

Muon identification
For the analysis, it is important to maintain a high muon identification efficiency while minimizing the probability for charged hadrons to be misidentified as muons. Achieving a low hadron-to-muon misidentification rate is especially important in the search for B 0 → µ + µ − , where the SM branching fraction is roughly an order of magnitude below that for B 0 s → µ + µ − and there are additional contributions to the background from two-body decays of B hadrons.
To achieve this goal, a new muon BDT was trained separately for the Run 1 and Run 2 data, using the tmva framework [37]. For both data samples, the starting point for muon identification is the set of global muons obtained from the standard CMS muon reconstruction [34,35]. In contrast, in the previous analysis [11], the starting point for the BDT was a sample of so-called tight muons, which forms a subset of the full global muon sample.
The muon BDT training was performed using simulated events derived from signal samples for the muons and from background samples for the misidentified charged hadrons. Statistically independent MC samples were used to evaluate the performance of the muon BDT in the optimization process. The variables used in the new muon BDT can be grouped into three categories according to whether they are associated with measurements from the silicon tracker, the muon system, or the combined global muon reconstruction.
The variables determined with the silicon tracker are sensitive to the quality of the muon track measurement and exploit the fact that tracks from charged hadrons and tracks from particles with a decay-in-flight often have lower quality. These variables are the track χ 2 /dof, the fraction of valid hits divided by the number of expected hits, the number of layers containing hits, and the change in track curvature. The changes in track curvature are identified using a dedicated kink-finding algorithm, which computes the difference be--7 -tween the predicted and measured azimuthal angle ϕ of the track at each layer. The values of the squared ϕ-angle differences, divided by their associated squared uncertainties, are then summed to obtain a discriminating variable.
The variables associated with measurements in the muon system are the standalone muon χ 2 /dof, the standalone muon compatibility with the muon hypothesis, and a variable quantifying the time-of-flight error in the RPC muon subsystem.
The variables related to the global muon reconstruction are the χ 2 /dof of the momentum matching between the extrapolated silicon tracker and standalone muon at the innermost muon layer; the χ 2 /dof of the position matching between the extrapolated silicon tracker and standalone muon at the innermost muon layer; the χ 2 between all silicon tracker hit positions and the global muon position; the output of the kink-finding algorithm (as described above, but applied to the global muon trajectory); the probability of the global muon track χ 2 /dof; and the product of the charges, as determined in the silicon tracker and the muon system.
The variables used in the muon BDT optimization process are chosen iteratively to provide the best performance with a minimal set of variables (the size of the MC training samples is limited). The same variable set is used in the muon BDT training for Runs 1 and 2.
The new muon BDT achieves an average misidentification rate of 6 × 10 −4 and 10 −3 for pions and kaons, respectively, for both Runs 1 and 2, together with a muon efficiency of about 70 (76)% for Run 1 (Run 2). Compared to the previous BDT used in ref. [11], the new BDT improves the muon efficiency by about 5% (absolute) for the same hadron misidentification rate. The proton misidentification rate is approximately 10 −4 .
The performance of the muon BDT is validated by comparing its behavior in simulation with that in data, using event samples in which a kinematically selected two-body decay provides a source of independently identified muons or hadrons. For muons, the decay J/ψ → µ + µ − is used. Charged hadrons are selected with the decays K 0 S → π + π − for pions, φ → K + K − for kaons, and Λ → pπ − for protons. These samples are used to compare the distributions of the variables used in the muon BDT in background-subtracted data and simulation, as well as the corresponding single-hadron misidentification probabilities. The distributions of all variables used in the muon BDT are found to be consistent between data and simulation. After correcting for trigger and reconstruction biases, the misidentification probabilities in data and simulation are also found to be consistent. This comparison is used to assign a 10% relative uncertainty in the pion and kaon misidentification probabilities, which are found to be roughly uniform over the range 5 < p T < 20 GeV. The limited statistical precision of the Λ → pπ − validation sample, with a proton misidentified as a muon, does not allow a differential comparison, and a relative systematic uncertainty of 60% is estimated based on the average difference between data and simulation for the rate of proton misidentification.
An independent study was performed to measure the misidentification rate of charged pions and kaons by reconstructing D * + → D 0 (→ K − π + )π + s decays, where the slow pion π + s allows the unambiguous identification of the charged kaon. This validation sample provides a set of charged hadrons with p T and impact parameter values comparable to those relevant -8 -

JHEP04(2020)188
for the B 0 → µ + µ − analysis. The limited size of the sample allows only a comparison of the integrated misidentification probabilities, which agree within the uncertainties between data and simulation.
To determine the systematic uncertainty in the muon identification efficiency, the muon BDT discriminator distribution is studied in data and simulation for muons from B + → J/ψK + and B 0 s → J/ψφ candidates. The efficiency ratio of the muon BDT discriminator requirement for muons from B + → J/ψK + and B 0 s → J/ψφ decays between data and MC simulation agrees to better than 3% in all analysis channels. This value is used as the estimate of the uncertainty in the relative muon identification efficiency between the signal and normalization modes.

Selection
The B → µ + µ − candidate selection starts with two oppositely charged global muons. To retain more muon candidates for the development of the analysis BDT and the validation of the background estimate, the full muon BDT discriminator requirement is applied only for the extraction of the final result. Both muons must have p T > 4 GeV and be matched to muons that triggered the event. The distance of closest approach d ca between the B meson candidate tracks is required to be less than 0.08 cm. After constraining the two muon tracks to a common (secondary) vertex, the invariant mass is required to satisfy 4.8 < m µ + µ − < 6.0 GeV.
The momentum and vertex position of the B candidate are used to select the primary vertex (PV) from which the B candidate originates, based on the distance of closest approach to each PV of the extrapolated trajectory of the B candidate. In the following, this PV is referred to as the bb-PV. To avoid a possible bias in the bb-PV position, each PV is refit without the B candidate tracks. In this fit, based on an adaptive fitting method [38], a weight from 0 to 1 is assigned to each track. The B candidate is rejected if the average track weight of the bb-PV (excluding the B candidate tracks) is smaller than 0.6.
In the offline analysis, many of the variables with the highest discriminating power are determined in 3D space. The flight-length significance 3D /σ( 3D ) is measured with respect to the bb-PV. For the Run 2 analysis, a correction is applied to 3D and σ( 3D ) to reduce differences between data and simulation. The B candidate pointing angle α is calculated as the opening angle between the B momentum and the vector from the bb-PV to the secondary vertex. The impact parameter δ 3D of the B candidate, its uncertainty σ(δ 3D ), and its significance δ 3D /σ(δ 3D ) are measured with respect to the bb-PV. The χ 2 /dof of the secondary vertex fit is also a powerful discriminant. The decay time is given by the product of the flight length 3D and the invariant mass of the B candidate, divided by the magnitude of the B candidate momentum.
To reduce background, isolation requirements are placed on the B candidate and the muon tracks. The background rejection power and signal efficiency of these requirements is not significantly affected by the increased pileup in Run 2. The presence of other PVs, not associated with the B candidate, requires that the isolation variables be calculated using only tracks that are related to the B candidate and its bb-PV. In the following, -9 -JHEP04(2020)188 the track sums include only those tracks that are associated with the bb-PV or that are not associated with any other PV. The latter class of tracks includes displaced tracks that are not part of any PV, but come close to the B candidate's secondary vertex according to criteria defined below. Tracks that are part of the B candidate are excluded from the track sums.
The B candidate isolation is calculated as I = p TB /(p TB + trk p T ). The sum includes all tracks from the bb-PV with p T > 0.9 GeV and ∆R = (∆η) 2 + (∆ϕ) 2 < 0.7, where ∆η and ∆ϕ are the differences in pseudorapidity and azimuthal angle between the charged track and the B candidate momentum. Tracks not associated with any other PV are also included in the sum if they have d ca < 0.05 cm with respect to the secondary vertex. A similar isolation variable, I µ = p Tµ /(p Tµ + trk p T ), is calculated for each muon, where the requirements p T > 0.5 GeV, ∆R < 0.5, and d ca < 0.1 cm are imposed. The track requirements for both isolation criteria are chosen to provide optimal background rejection and produce good agreement between the data and MC simulation.
A variable N close trk is introduced to specify the number of tracks with p T > 0.5 GeV that satisfy d ca < 0.03 cm with respect to the B candidate vertex. This variable is used to reduce background from partially reconstructed decays, e.g., where one pion is misidentified as a muon. The minimum distance of closest approach d 0 ca to the B candidate secondary vertex is determined with the same set of tracks, with a correction applied for Run 2 to reduce data versus simulation differences.
The final selection relies on an analysis BDT trained [37] with MC signal events (B 0 s → µ + µ − and B 0 → µ + µ − decays) and with combinatorial background events taken from a data sideband in the dimuon invariant mass defined by 5.45 < m µ + µ − < 5.9 GeV. The analysis BDT selection for the Run 1 data is described in ref. [11]. For Run 2, the 2016A and 2016B running periods require separate treatment, as stated above. To avoid possible bias, the data sample is randomly split into three subsets such that the training and validation of the analysis BDT are performed on samples independent of its application. A preselection eliminates events with extreme outlier values in the relevant variables and removes the vast majority of background events where the dimuon vertex is not well separated from the bb-PV. The signal and normalization sample topologies are kept as similar as possible to reduce uncertainties in their efficiency ratios, cf. eq. (1.3). The most important preselection requirements are 3D /σ( 3D ) > 4, xy /σ( xy ) > 4, α 3D < 0.2, δ 3D < 0.02 cm, δ 3D /σ(δ 3D ) < 4, χ 2 /dof < 5, and d ca < 0.08 cm. After this preselection, more than 6000 dimuon background events remain in each subset. Because of this statistical limitation, the analysis BDT optimization starts from a set of core variables: To this list, optional variables, d ca , δ 3D , 3D , xy /σ( xy ), p T , and η, are added. The final analysis BDTs are chosen based on the maximum of S/ √ S + B, where S is the expected B 0 s → µ + µ − signal yield in the mass region 5.3 < m µ + µ − < 5.45 GeV from simulation and B is the expected combinatorial background in that mass region, extrapolating from the data sideband. This approximate figure of merit was used only in the optimization procedure and not in the procedure used to obtain the final result. Systematic uncertainties related to the selection are determined using the normalization and control samples, as described below.   The reconstruction of the B + → J/ψK + normalization sample and the B 0 s → J/ψφ (φ → K + K − ) control sample is similar to the reconstruction of B → µ + µ − candidates. Two oppositely charged global muons with p T > 4 GeV, p T µ + µ − > 7 GeV, and 2.8 < m µ + µ − < 3.2 GeV are combined with either one or two tracks, assumed to be kaons, with p T > 0.6 GeV. The maximum distance of closest approach (d max ca ) between all pairs of the B candidate tracks is required to satisfy d max ca < 0.08 cm. For B 0 s → J/ψφ candidates, the two kaons must have an invariant mass 1.01 < m K + K − < 1.03 GeV. All B candidates with an invariant mass 4.8 < m < 6.0 GeV are retained for further analysis. Since B + → J/ψK + and B 0 s → J/ψφ candidates are analyzed with the same analysis BDT as the B → µ + µ − candidates, the two muons from the J/ψ are refit to a common vertex and this fit χ 2 /dof is used in the analysis BDT, so as to have the same number of degrees of freedom as in the signal decay. The determination of the other variables is based on the complete B candidate secondary vertex, also including the additional kaon(s) in the fit.
The B candidate yields in the normalization sample are determined with binned ML fits. Example invariant mass distributions from Run 2 are shown in figure 1. The B + → J/ψK + signal component is modeled by a double-Gaussian function with common mean. The background is modeled with an exponential function for the combinatorial component, an error function for the partially reconstructed background from B → J/ψKX, and a double-Gaussian function with common mean for B + → J/ψπ + decays. For this latter component, the integral is constrained to 4% of the signal yield [16]  the uncertainty combines the statistical and systematic components (see table 3 below for the yields in different running periods and channels). The systematic uncertainty in the B + → J/ψK + yield is approximately 4% and is determined by comparing the yields when fitting with or without a J/ψ mass constraint. Background-subtracted data distributions are compared to simulation for both the B + → J/ψK + and B 0 s → J/ψφ candidate event samples. As examples, figure 2 shows the most discriminating variables used in the analysis BDT: the flight length significance 3D /σ( 3D ), the pointing angle α, and the number of close tracks N close trk . The distributions are shown for the central channel in the 2016B data sample after the loose preselection for the analysis BDT training has been applied. The ratio between the background-subtracted data and the simulation is shown in the lower plots. The shaded bands in the ratio plots, included for illustration, demonstrate that the data-simulation difference can be as large as around 20%, mostly in peripheral parts of the distributions (the same comment applies to the analogous bands shown below in figures 3 and 4). These residual differences contribute to the systematic uncertainty in the analysis efficiency ratio between signal and normalization, where their impact is reduced. In figure 3, distributions of kinematic variables are displayed: the subleading muon p T , the cosine of the muon helicity angle θ µ − (θ µ − is the angle in the J/ψ rest frame between the µ − and the K + direction), and the B meson candidate proper decay time. The B + → J/ψK + analysis BDT discriminator distributions are plotted in figure 4. To illustrate the discrimination power of the analysis BDT, figure 4 also shows the BDT discriminator distributions from the m µ + µ − data sideband and the B 0 s → µ + µ − signal MC simulation. The systematic uncertainty in the analysis efficiency (accounting for losses in reconstruction, identification and selection) and in its ratio between the signal and normalization modes is estimated with the double ratio of analysis efficiencies between B + → J/ψK + and B 0 s → J/ψφ decays in data and MC simulation, based on the distributions in figure 4. The B 0 s → J/ψφ control sample is used in this context as a placeholder for the signal because of the statistical limitations of the data signal events. Just as the normalization sample has one additional track compared to the signal sample, the control sample has one additional track compared to the normalization sample. Applying the analysis BDT discriminator requirements, the efficiency ratios are determined between the B + → J/ψK + and B 0 s → J/ψφ samples and subsequently the ratios between data and MC simulation are calculated for the double ratio. The deviation from unity is taken as the systematic uncertainty. In Run 2, it is approximately 5%, while in Run 1, it varies between 7 and 10% depending on the year and channel. For the effective lifetime determination, a small systematic uncertainty of 0.02 ps associated with the selection efficiency is estimated from a variation of the analysis BDT requirement.
The signal selection efficiency depends on the proper decay time because of selection requirements in the trigger and offline analysis. In the HLT, the increased instantaneous luminosity in Run 2 led to selection requirements on the maximum impact parameter of tracks relative to the interaction point. These requirements gradually reduce the efficiency for proper decay times larger than 6 ps. The analysis selection requirements on isolation and on the separation of the B candidate secondary vertex from the bb-PV strongly reduce the efficiency for proper decay times below 1 ps. The events in the signal MC simulation are reweighted to correspond to the SM expectation of τ SM µ + µ − = τ B 0 sH = 1.615 ps [16]. A priori, the effective lifetime is unknown, and we therefore use ∆ ≡ [ε tot (τ period. For the B 0 case, where the lifetime difference is much smaller, the average lifetime is used. The systematic uncertainty due to differences between data and simulation for the production mechanism mixture is estimated as follows. In events with a B + → J/ψK + or B 0 s → J/ψφ candidate and a third muon µ 3 , presumed to originate from the decay of the other b hadron in the event, the variable ∆R(B, µ 3 ) provides discrimination between gluon splitting on the one hand and gluon fusion plus flavor excitation on the other. Templates from MC simulation are fit to the data distribution and used to determine the relative production mechanism fractions in data. Reweighting the fractions in the MC simulation to correspond to the fractions determined in data provides an estimate of 3% for the systematic uncertainty in the efficiency ratio.
To cross-check whether a significant variation in f s /f u is observed with respect to the B candidate kinematic variables η (up to |η| < 2.2) and p T (from 10 to 100 GeV), efficiency--14 -

JHEP04(2020)188
corrected ratios of B 0 s → J/ψφ and B + → J/ψK + yields are studied. In addition, a sample of B 0 → J/ψK * 0 (with K * 0 → K + π − ) decays is reconstructed to perform the analogous cross-check for f s /f d , closely following the procedure discussed in ref. [39]. No significant p T or η dependence is observed in either case.
A complementary approach for the study of the systematic uncertainty in the selection efficiency is to consider the standard deviation of the effective branching fraction B(B 0 s → J/ψφ) over all running periods and channels. For this purpose, the control sample yield is substituted for the signal yield in eq. (1.3). The branching fraction B(B 0 s → J/ψφ) is affected by systematic uncertainties in the analysis BDT selection efficiency and yield determination, the kaon tracking efficiency, and possible differences between the isolation of B + and B 0 s mesons due to their different fragmentation. A standard deviation of 4% is observed, substantially smaller than the combination of the above systematic uncertainty contributions. We conclude that the systematic uncertainties are not underestimated.

Branching fraction measurement
The branching fractions B(B 0 s → µ + µ − ) and B(B 0 → µ + µ − ) are determined with a 3D extended UML fit to the dimuon invariant mass m µ + µ − distribution, the relative mass resolution σ(m µ + µ − )/m µ + µ − , and the binary distribution of the dimuon pairing configuration C, where C = +1 (−1) when the two muons bend towards (away from) each other in the magnetic field.
The fit model contains six components: the distributions of the signals B 0 s → µ + µ − and B 0 → µ + µ − , the peaking background B → hh ( ) , the rare semileptonic background B → hµν, the rare dimuon background B → hµµ, and the combinatorial background. The expected yields of the rare background components are normalized to the B + → J/ψK + yield, corrected for the relative efficiency ratios and the respective production ratios. The PDF for each component i (where 1 ≤ i ≤ 6) is (7.1) where the P i terms are the PDFs for the indicated variable.
The binary C distribution is used in evaluating the possible underestimation of the B → hh ( ) background. The misidentification probabilities for h and h ( ' ) are assumed to be independent. However, this assumption is not necessarily correct if the two tracks overlap in the detector. A possible residual enhancement of the double-hadron misidentification probability exists relative to the assumption that this probability factorizes into the product of two separately measured misidentification probabilities. Potential bias from this source can be reduced by requiring that both muon candidates satisfy very strict quality criteria or by demanding that the tracks be spatially separated. The effect is different for two muon candidates that bend towards or away from each other, and thus the C distribution is introduced. A scale factor is used in the fit to account for the change in the hadron misidentification rate if the hadron bends towards another muon candidate. The signal PDFs are based on a Crystal Ball function [40, see appendix D] for the invariant mass and a nonparametric kernel estimator [41] with a superposition of Gaussian -15 -JHEP04(2020)188 kernels for the relative mass resolution. The width σ CB of the Crystal Ball function is a conditional parameter with a linear dependence on the dimuon mass resolution σ CB = κ σ(m µ + µ − ). All parameters, except for the normalizations, are fixed to values obtained from fits to the signal MC distributions. The dimuon mass scale is studied with J/ψ → µ + µ − and Υ(1S) → µ + µ − decays, interpolated to m B 0 s . In Run 1, the MC PDF is shifted by −6.0 (−7.0) MeV at the B 0 s mass for the central (forward) channel, while in Run 2 the shift is −4.4 (−3.1) MeV. The difference in the mass resolution between data and MC simulation is studied with B + → J/ψK + events and is found to be 5%. Since correcting for this difference changes the measured branching fractions by only around 0.2%, no associated systematic uncertainty is assigned.
The peaking background is constructed from the sum of all B → hh ( ) decay modes, weighted by the branching fraction, the product of the single-hadron misidentification probabilities, and the respective production ratios. We assume that the selection efficiency for the peaking background equals the signal selection efficiency. The trigger efficiency is taken as half the signal efficiency, with a 100% relative uncertainty because of the limited size of the MC event samples where both hadrons are misidentified as muons. The invariant mass PDF is modeled with the sum of a Gaussian and a Crystal Ball function with a common mean value. The relative mass resolution is modeled with a kernel estimator as for the signal PDF. The width of the mass distribution is independent of the percandidate mass resolution and is fixed to the distribution obtained from the weighted sum of background sources in the MC simulation.
The shapes of the mass and relative mass resolution distributions for the rare semileptonic background are obtained by adding the MC expectations for B 0 → π − µ + ν, B 0 s → K − µ + ν, and Λ b → pµ − ν µ decays, with a weighting as for the peaking background (with the exception of the trigger efficiency, which is taken to be equal to the signal efficiency). Both the mass and relative mass resolution distributions are modeled with kernel estimators. Rare dimuon background estimates are based on the decays B 0 → π 0 µ + µ − and B − → π − µ + µ − , with the mass and relative mass resolution PDFs modeled with nonparametric kernel estimators. To account for missing contributions and efficiency differences with respect to the signal, these background components (B → hµν and B → hµµ) are scaled with a common factor such that their sum, when added to the combinatorial background, which is extrapolated from the sideband, matches the event yield of data events in the mass region 4.9 < m µ + µ − < 5.2 GeV.
The normalizations of all of the rare background components are constrained within the combined uncertainties of the branching fractions, misidentification probabilities, and efficiencies. For the peaking background, the combined relative uncertainty is about 100%, while the rare semileptonic and dimuonic background components have relative uncertainties of order 15%.
The combinatorial background invariant mass distribution is modeled by a nonnegative Bernstein polynomial of the first degree, whose parameters (both slope and normalization) are determined in the fit. The fit result changes by 2.3% (0.6%) for B 0 s → µ + µ − (B 0 → µ + µ − ) when using an exponential function instead. This is included as a systematic uncertainty. The relative mass resolution is modeled by a kernel function, determined from the events in the mass data sideband.
In the UML fit, the parameters of interest are B(B 0 s → µ + µ − ) and B(B 0 → µ + µ − ). The nuisance parameters are profiled, subject to constraints. Gaussian constraints are used for the uncertainties in B(B + → J/ψK + ), the ratio of efficiencies ε B + tot /ε tot , and the yields of the normalization sample. For the smaller yields of B → hµν, B → hµµ, and B → hh ( ) decays, log-normal priors are used as constraints. Prior to analyzing the events in the signal region, extensive tests were performed with pseudo-experiments to assess the sensitivity of the fitting procedure as well as its robustness and accuracy.
As mentioned in the Introduction, the data are divided into two channels (central or forward) depending on the pseudorapidity of the most forward muon |η f µ |, and into data collection running periods (2011, 2012, 2016A, and 2016B). The separation between the central and forward channels differs between Run 1 and Run 2 because of limitations imposed by the larger trigger rates in Run 2. In Run 1, the central (forward) channel covers |η f µ | < 1.4 (1.4 < |η f µ | < 2.1) and in Run 2, |η f µ | < 0.7 (0.7 < |η f µ | < 1.4). In total, there are eight channels: central and forward in four data-taking running periods. To maximize the sensitivity, channels with sufficient statistical precision are divided into mutually exclusive categories, a low-and a high-range category in the analysis BDT discriminator. The low-range category extends from the boundary given in the first row in table 1 to the boundary in the second row. The high range extends from the boundary in the second row to +1. These analysis BDT discriminator requirements are determined by maximizing the expected sensitivity using the full UML fit framework. The boundaries differ between data-taking running periods because the distributions in the analysis BDT discriminator value differ. In summary, the UML fit is performed simultaneously in 14 categories. Figure 5 (left) shows the mass distribution for the combined high-range analysis BDT categories, with the fit results overlaid. The B 0 s → µ + µ − signal contribution is clearly visible. The corresponding distribution for the combined low-range BDT categories is shown in figure 5 (right). The result of the fit to the data in the 14 categories defined in table 1 is where the experimental uncertainty combines the dominant statistical and systematical terms. The second uncertainty is due to the uncertainty in f s /f u . The systematic uncertainties are summarized in table 2. The event yields of the fit components, the average p T -17 -
The likelihood contours of the fit are shown in figure 6 (left), together with the SM expectation. The correlation coefficient between the two branching fractions is −0.181.
All observed results are consistent within the uncertainties with the SM expectations. Restricting the present analysis to the Run 1 data results in an observed branching fraction of B(B 0 s → µ + µ − ) = (2.3 +1.0 −0.8 ) × 10 −9 with an observed (expected) significance of 3.3 (4.5) standard deviations. This is consistent with the results of ref. [12] when that analysis is restricted to the CMS data set. The upper limit on B(B 0 → µ + µ − ) is substantially improved compared to the limit of our previous study [11], even when using Run 1 data alone, because of more stringent muon identification criteria and the introduction of the UML fit.

Effective lifetime measurement
Two independent fitting methods were developed to determine the B 0 s → µ + µ − effective lifetime, in order to allow for extensive validation, cross-checks, and systematic studies. The primary method consists of an extended 2D UML fit to the dimuon invariant mass and proper decay time [39,46]. The second method employs a 1D approach in which the background is subtracted using the sPlot [23] technique, with a function then fitted to the background-subtracted distribution using a binned ML. The 2D UML approach was chosen as the primary method, prior to analyzing the events in the signal region, because it exhibited better median expected performance.
A slightly modified analysis BDT setup is used for both approaches. For simplicity, a single BDT discriminator requirement, as indicated in table 4, is used for each channel. As before, these requirements are determined by optimizing the expected performance.
The fits are performed over the decay time range 1 < t < 11 ps. For very short times, the reconstruction efficiency is small because of the flight-length significance and isolation requirements, while for long times, the efficiency is reduced in Run 2 because of the HLT High BDT Categories Peaking bkg 4.9 5 5. 1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 [GeV]    Table 2. Summary of systematic uncertainty sources described in the text. The uncertainties quoted for the branching fraction B(B 0 s → µ + µ − ) are relative uncertainties, while the uncertainties for the effective lifetime τ µ + µ − are absolute and are given for both the 2D UML and sPlot analysis methods. The relative uncertainties in the upper limit on B(B 0 → µ + µ − ) differ for the background yields and parametrization, but have negligible impact on that result. The bottom rows provide the total systematic uncertainty and the total uncertainty in the branching fraction and the effective lifetime measurements. Contributions that are included in other items are indicated by (*). Nonapplicable sources are marked with dashes.
requirements. The results of both approaches are limited by the small number of signal events. Systematic uncertainties have only a small impact on the total uncertainty.

Two-dimensional unbinned maximum likelihood fit
The extended 2D UML fit uses the proper decay time resolution, σ t , as a conditional parameter. The unnormalized PDF P has the following expression:  Table 3. Summary of the fitted yields for B 0 s → µ + µ − , B 0 → µ + µ − , the combinatorial background for 5.2 < m µ + µ − < 5.45 GeV, and the B + → J/ψK + normalization, the average p T of the B 0 s → µ + µ − signal, and the ratio of efficiencies between the normalization and the signal for all 14 categories of the 3D UML branching fraction fit. The high and low ranges of the analysis BDT discriminator distribution are defined in table 1. The size of the peaking background is 5-10% of the B 0 → µ + µ − signal. The average p T is calculated from the MC simulation and has negligible uncertainties. The uncertainties shown include the statistical and systematic components. It should be noted that the B 0 s → µ + µ − and B 0 → µ + µ − yields and their uncertainties are determined from the branching fraction fit and also include the normalization uncertainties.  Table 4. Analysis BDT discriminator minimum requirements per channel and running period for the 1D and 2D effective lifetime fits.
where N sig , N comb , N peak , and N semi are the B 0 s → µ + µ − signal, combinatorial, peaking (both B 0 → µ + µ − and B → hh ( ) ), and semileptonic (B → hµν combined with B → hµµ) background yield Poisson terms, respectively. The invariant mass and decay time PDFs for the signal and background components are described by P (m µ + µ − ) and T (t; σ t ), respectively. The efficiencies ε sig , ε peak , and ε semi for the signal and background components as a function of the proper decay time are determined from simulated event samples.
For the PDFs describing the dimuon invariant mass distribution (P sig , P comb , P peak , P semi ), the signal shape is parametrized by a Crystal Ball function, while the combinatorial, peaking, and semileptonic background contributions are parametrized by a nonnegative Bernstein polynomial of the first degree, the sum of a Crystal Ball function and a Gaussian function with common mean, and a Gaussian -21 -JHEP04(2020)188 function, respectively. For the PDFs describing the proper decay time distributions (T sig , T comb , T peak , T semi ), the signal and background shapes are parametrized by individual exponential functions, which are convolved with a Gaussian function to incorporate the effect of the detector resolution, and with the T sig exponential function depending on the effective lifetime. The signal, peaking, and semileptonic background proper decay time PDFs are corrected with their respective efficiency factors (ε sig , ε peak , ε semi ). Such a correction is not applied to the combinatorial background since the decay time PDF T comb is modeled from data directly. The modeling of the efficiency function has a systematic uncertainty of about 0.01 ps, determined by variation of the parametrization.
The signal yield, the effective lifetime τ µ + µ − , and all parameters of the combinatorial background (except the combinatorial background in the forward channel of 2011, which is held fixed because there are no events in the sideband) are determined in the fit. All other parameters are either constrained (background yields of the peaking and semileptonic components inside log-normal constraints) or fixed to the MC simulation values (all other parameters). The UML fit is performed simultaneously in the eight independent channels defined in table 4.
The combined mass and proper decay time distributions from all channels, with the fit results overlaid, are shown in figure 7. The effective lifetime obtained from the fit is −0.44 ps, (8.2) where the uncertainty represents the combined statistical and systematic terms. Systematic uncertainties, beyond those already discussed, include small contributions from the limited statistical precision of the MC simulation (0.03 ps) and the C correction (0.01 ps). All systematic uncertainties are summarized in table 2.

One-dimensional binned maximum likelihood fit
In the second method, the complete PDF described in section 7 is used to determine sPlot weights. All events passing the analysis BDT requirements in table 4, but without any proper decay time selection, are used for this step. Because of the small number of events in individual channels, an integration is performed over the central and forward channels for both the Run 1 and Run 2 data. The effective lifetime is determined with an exponential function modified to include the channel-dependent resolution and efficiency effects. To properly determine the uncertainty in the effective lifetime from the weighted fit, a custom algorithm [47] is implemented. This algorithm has several features. First, it performs a weighted binned ML fit to the sPlot distribution to provide the correct central value and covariance matrix. Second, to reduce biases associated with large histogram bin widths, it calculates the integral of the PDF, integrating over bins. Third, it incorporates a resolution and efficiency model into the effective decay time PDF. Finally, it provides asymmetric uncertainties in the fit parameters. The determined effective lifetime and the associated variance are consistent with the expectations from pseudo-experiments and the statistical uncertainties are in agreement with the confidence intervals reported by the Neyman construction [48].    Figure 8 shows the mass distribution of all contributing data, without requiring t > 1 ps, and the weighted signal proper decay time distribution, together with the result of the binned ML fit. The fit yields τ µ + µ − = 1.55 +0.52 −0.33 ps, where the uncertainty is the combination of the statistical and systematic contributions. Using pseudo-experiments with post-fit nuisance parameters, a fit bias of +0.09 ps is observed and corrected for in the result above. It is included as a systematic uncertainty. The reasons for this bias are, first, negative yields are not allowed in the weighted ML fit and, second, the sample size at large decay times is very small. The decay time dependence of the selection efficiency leads to a systematic uncertainty of 0.04 ps. All systematic uncertainties are summarized in table 2.
The two fitting methods, the 2D UML fit and the 1D sPlot approach, yield consistent results. The observed total uncertainties in the primary fitting method are about one root-mean-square deviation larger than the expected median uncertainties ( +0. 39 −0.30 ps). The expected median uncertainty for the 1D sPlot approach are +0.49 −0.31 ps. While the uncertainties are sizable, the results are consistent with the SM expectation that only the heavy B sH state contributes to the B 0 s → µ + µ − decay.

Summary
Measurements of the rare leptonic B meson decays B 0 s → µ + µ − and B 0 → µ + µ − have been performed in pp collision data collected by the CMS experiment at the LHC, corresponding to integrated luminosities of 5 fb −1 at center-of-mass energy 7 TeV, 20 fb −1 at 8 TeV, and 36 fb −1 at 13 TeV. The B 0 s → µ + µ − decay is observed with a significance of 5.6 standard deviations and the time-integrated branching fraction is measured to be B(B 0 s → µ + µ − ) = [2.9 ± 0.6 (stat) ± 0.3 (syst) ± 0.2 (frag)] × 10 −9 , where the last uncertainty refers to the uncertainty in the ratio of the B 0 s and the B + fragmentation functions. No significant B 0 → µ + µ − signal is observed and an upper limit B(B 0 → µ + µ − ) < 3.6 × 10 −10 is determined at 95% confidence level. The B 0 s → µ + µ − effective lifetime is found to be τ µ + µ − = [1.70 +0.60 −0.43 (stat)±0.09 (syst)] ps. The results for the branching fractions supersede the previous results from CMS [11], which were based on the 7 and 8 TeV data only. All of the results are in agreement with the standard model predictions.

Acknowledgments
We congratulate our colleagues in the CERN accelerator departments for the excellent performance of the LHC and thank the technical and administrative staffs at CERN and at other CMS institutes for their contributions to the success of the CMS effort. In addition, we gratefully acknowledge the computing centers and personnel of the Worldwide LHC Computing Grid for delivering so effectively the computing infrastructure essential to our analyses. Finally, we acknowledge the enduring support for the construction and operation of the LHC and the CMS detector provided by the following funding agencies: BMBWF and FWF (Austria); FNRS and FWO (Belgium); CNPq, CAPES, FAPERJ, Open Access. This article is distributed under the terms of the Creative Commons Attribution License (CC-BY 4.0), which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.