1 Introduction

The existence of new phenomena beyond those predicted by the Standard Model (SM), hereafter referred to as New Physics (NP), could introduce sizeable effects on \(CP\)-violating observables. In the SM, \(CP\) violation originates from an irreducible complex phase in the Cabibbo–Kobayashi–Maskawa (CKM) matrix that describes the mixing of the mass and weak interaction eigenstates of the quarks [1, 2]. In decays of a \({{B} ^0_{s}} \) meson to a \(CP\) eigenstate, \(CP\) violation can originate from the interference of the amplitude of the decay and that of the adjoint decay preceded by \({{B} ^0_{s}} \)-\({{\overline{B}{}} {}^0_{s}} \) oscillation. It manifests itself through a nonzero value of the phase \(\phi _s = - \mathrm {arg} \left( \lambda \right) \), where the parameter \(\lambda \equiv \mathrm {arg} \left[ \left( {q}/{p}\right) \left( {\overline{A}}/{A}\right) \right] \) describes \(CP\) violation in the interference between mixing and decay. Here, A and \(\overline{A}\) are the amplitudes for a \(B_s^{0}\) or a \({{\overline{B}{}} {}^0_{s}} \) meson to decay to the same final state and the complex parameters \(p = \langle B_s^{0}|B_\mathrm{L}\rangle \) and \(q=\langle {{\overline{B}{}} {}^0_{s}} |B_\mathrm{L}\rangle \) describe the relation between the flavour and the mass eigenstates, light, L, and heavy, H. The two eigenstates have a decay width difference \(\Delta \Gamma _s \equiv \Gamma _\mathrm{L} - \Gamma _\mathrm{H}\) and a mass difference \(\Delta m_s \equiv m_\mathrm{H} - m_\mathrm{L}\). In the absence of \({CP} \) violation in the decay and assuming negligible \({CP} \) violation in \({{B} ^0_{s}} \)-\({{\overline{B}{}} {}^0_{s}} \) mixing [3], \(|\lambda |\) is expected to be unity. In the SM, ignoring subleading contributions, the phase \(\phi _{s}\) can be related to the CKM matrix elements \(V_{ij}\), such that \(\phi _s \approx -2\beta _s\), where \(\beta _s \equiv \arg [ - (V_{ts}V_{tb}^*)/(V_{cs}V_{cb}^*)]\). Global fits to experimental data, assuming unitarity of the CKM matrix, give a precise prediction of a small value, namely \(-2\beta _s = -0.0369 ^{+0.0010}_{-0.0007}\mathrm { \,rad} \) according to the CKMfitter group [4] and \(-2\beta _s = -0.0370 \pm 0.0010\mathrm { \,rad} \) according to the UTfit collaboration [5]. However, many NP models [6, 7] predict larger values for this phase if non-SM particles were to contribute to \({{B} ^0_{s}} \)\({{\overline{B}{}} {}^0_{s}} \) oscillations, while satisfying all existing constraints. Thus, a measurement of \(\phi _s\) different from the SM prediction would provide clear evidence for NP.

Due to its high yield and clean experimental signature, the most sensitive decay channel to NP contributions is \({{B} ^0_{s}} \rightarrow J/\psi (\rightarrow \mu ^+ \mu ^-) {{K} ^+} {{K} ^-} \) [8], where the kaon pair predominantly originates from the decay of a \(\phi (1020)\) resonance.Footnote 1 Angular momentum conservation in the decay implies that the final state is an admixture of \(CP\)-even and \(CP\)-odd components, with orbital angular momentum of 0 or 2, and 1, respectively. Moreover, along with the three polarisation states of the \(\phi \) meson (P-wave states), there is also a \(CP\)-odd \({{K} ^+} {{K} ^-} \) component in an S-wave state [9]. The data can therefore be described considering four polarisation amplitudes \(A_g = |A_g|e^{-i\delta _g}\), where the indices \(g \in \{0,\parallel ,\perp ,S\}\) refer to the longitudinal, transverse-parallel and transverse-perpendicular relative orientations of the linear polarisation vectors of the \(J/\psi \) and \(\phi \) mesons and S to the single S-wave amplitude, respectively. The \(CP\)-even and \(CP\)-odd components are disentangled by a decay-time-dependent angular analysis, where the angular observables \(\cos \theta _{K}\), \(\cos \theta _{\mu }\) and \(\phi _{h}\) are defined in the helicity basis as described in Ref. [10]. The polar angle \(\theta _{K}\) (\(\theta _{\mu }\)) is the angle between the \(K^+\) (\(\mu ^{+}\)) momentum and the direction opposite to the \({B} ^0_{s} \) momentum in the \(K^{+}K^{-}\) (\(\mu ^{+}\mu ^{-}\)) centre-of-mass system and \(\phi _{h}\) is the azimuthal angle between the \(K^{+}K^{-}\) and \(\mu ^{+}\mu ^{-}\) decay planes. The \(\phi _{h}\) angle is defined by a rotation from the \(K^{-}\) side of the \(K^{+}K^{-}\) plane to the \(\mu ^{+}\) side of the \(\mu ^{+}\mu ^{-}\) plane. The rotation is positive in the \(\mu ^{+}\mu ^{-}\) direction in the \({B} ^0_{s} \) rest frame.

A decay-time-dependent angular analysis also allows the determination of \(\Delta \Gamma _s\), and of the average \({{B} ^0_{s}} \) decay width, \(\Gamma _s \equiv \left( \Gamma _\mathrm{L} + \Gamma _\mathrm{H}\right) /2\). In the SM, \(\Gamma _s\) and \(\Delta \Gamma _s\) can be calculated within the framework of the heavy quark expansion (HQE) theory [11,12,13,14,15,16,17], where a perturbative expansion of the amplitudes in inverse powers of the b-quark mass is used to calculate b-hadron observables. The ratio of the average decay width of \({{B} ^0_{s}} \) and \({{B} ^0} \) mesons, \(\Gamma _s/\Gamma _d\), is usually the preferred observable to compare with experimental measurements as it allows the suppression of common uncertainties in the calculation. The predictions are \(\Delta \Gamma _s=0.088\pm 0.020{\mathrm { \,ps^{-1}}} \) [18] and \(\Gamma _s/\Gamma _d=1.0006\pm 0.0025\) [19]. The high precision of the ratio \(\Gamma _s/\Gamma _d\) makes it an excellent testing ground for the validity of the HQE  [19, 20]. In addition, \(\Delta \Gamma _s\) can provide bounds complementary to those from \(\Gamma _s/\Gamma _d\) on quark-hadron duality violation [21].

Measurements of \(\phi _s\), \(\Delta \Gamma _s\) and \(\Gamma _s\) using \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) decays, with \(J/\psi \rightarrow \mu ^+ \mu ^-\), have been previously reported by the D0 [22], CDF [23], ATLAS [24, 25], CMS [26] and LHCb [27] collaborations. The LHCb collaboration has also exploited different decay channels, namely \({{B} ^0_{s}} \rightarrow J/\psi {{\pi } ^+} {{\pi } ^-} \) [28], \({{B} ^0_{s}} \rightarrow \psi (2S) \phi \) [29], \({{B} ^0_{s}} \rightarrow D_s^+ D_s^-\) [30] and \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) for the \({{K} ^+} {{K} ^-} \) invariant-mass region above 1.05\(\mathrm {\,Ge}{\mathrm {V\!/}c^2}\)  [31]. The world-average values, including all of the above mentioned results, are \(\phi _s=-0.021 \pm 0.031\mathrm { \,rad} \), \(\Delta \Gamma _s = 0.085 \pm 0.006{\mathrm { \,ps^{-1}}} \) and \(\Gamma _s/\Gamma _d= 1.006 \pm 0.004\) [32]. They are in agreement with the abovementioned predictions.

The main parameters of interest in this paper are \({\phi _{{s}}} \), \(|\lambda |\), \({\Gamma _{{s}}}-{\Gamma _{{d}}} \), \({\Delta \Gamma _{{s}}} \) and \(\Delta m_{s}\) measured in \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) decays, in the \({{K} ^+} {{K} ^-} \) mass region 0.99–1.05\(\mathrm {\,Ge}{\mathrm {V\!/}c^2}\). The new measurement reported is based on a data sample of proton–proton collisions recorded at a centre-of-mass energy of \(\sqrt{s}= 13{\mathrm {\,Te}{\mathrm {V}}} \) in 2015 and 2016 during Run 2 of LHC operation, corresponding to an integrated luminosity of \(0.3\,\mathrm{fb}^{-1} \) and \(1.6\,\mathrm{fb}^{-1} \), respectively. The decay width difference \(\Gamma _s-\Gamma _d\) is determined using \({{B} ^0} \rightarrow J/\psi {{K} ^+} {{\pi } ^-} \) decays as a reference, reconstructed in the same data set as the signal. The \({{K} ^+} {{\pi } ^-} \) in the final state originates predominantly from the decay of a \(K^*(892)^0\) resonance. The analysis procedure gives access to \(\Gamma _s-\Gamma _d\) rather than \(\Gamma _s\) due to the dependence of the time efficiency parametrisation on \(\Gamma _d\). This allows the determination of \(\Gamma _s-\Gamma _d\) with a significant reduction of the systematic uncertainty associated with lifetime-biasing selection requirements compared to the previous measurement. Taking as an input the precisely known value of \(\Gamma _d\) [32], the ratio \(\Gamma _s/\Gamma _d\) may be determined with higher precision with respect to measuring the two lifetimes independently.

In this analysis, the polarisation-independent \(CP\) -violating parameter \(\lambda _r\), associated with each polarisation state r, is defined such that \(\lambda _r= \eta _r \lambda \), where \(\eta _r = +1 \,\ \text {for }r\in \{0,\parallel \}\) and \(\eta _r =-1 \, \text {for } r\in \{\perp ,\mathrm{S}\}\). As a consequence, \(\phi _s = -\arg {\lambda }\). However, this assumption can be relaxed such that the values of \(\phi _{s}^{r}\) and \(|\lambda _r|\) are measured separately for each polarisation state. In addition, the following quantities are measured: the \(\phi \) polarization fractions \(|A_{0}|^{2}\) and \(|A_{\perp }|^{2}\); the strong-phase differences \(\delta _{\perp } - \delta _{0}\) and \(\delta _{\parallel } - \delta _{0}\); the fraction of S-wave, \(F_\mathrm{S}\), and the phase difference \(\delta _{S}-\delta _{\perp }\). The S-wave parameters are measured in bins of \(m({{K} ^+} {{K} ^-})\). The sum \(|A_{0}|^{2}+|A_{\perp }|^{2}+|A_{\parallel }|^{2}\) equals unity and by convention \(\delta _{0}\) is zero.

After a brief description of the LHCb detector in Sect. 2, the candidate selection and the background subtraction using the sPlot technique [33] are outlined in Sect. 3. The relevant inputs to the analysis, namely the decay-time resolution, the decay-time efficiency, the angular efficiency and the flavour-tagging calibration, are described in Sects. 4, 56 and 7, respectively. The sFit procedure [34], the evaluation of the systematic uncertainties and the results are discussed in Sects. 89 and 10, respectively. The combination of the results obtained in this analysis with those measured by the LHCb collaboration using data collected in 2011 and 2012 and determined using 2015 and 2016 \({{B} ^0_{s}} \rightarrow J/\psi {{\pi } ^+} {{\pi } ^-} \) data is presented in Sect. 11. Finally, conclusions are drawn in Sect. 12.

2 Detector and simulation

The LHCb detector [35, 36] is a single-arm forward spectrometer covering the pseudorapidity range \(2<\eta <5\), designed for the study of particles containing \(b \) or \(c \) quarks. The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about \(4{\mathrm {\,Tm}}\), and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. The tracking system provides a measurement of the momentum, \(p\), of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200\(\mathrm {\,Ge}{\mathrm {V\!/}c}\). The minimum distance of a track to a primary vertex (PV), the impact parameter (IP), is measured with a resolution of \((15+29/p_{\mathrm {T}}){\,\upmu \mathrm {m}} \), where \(p_{\mathrm {T}}\) is the component of the momentum transverse to the beam, in \(\mathrm {\,Ge}{\mathrm {V\!/}c}\). Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers.

Samples of simulated events are used to optimise the signal selection, to derive the angular efficiency and to correct the decay-time efficiency. In simulations, pp collisions are generated using Pythia  [37, 38] with a specific LHCb configuration [39]. Decays of hadronic particles are described by EvtGen  [40], in which final-state radiation is generated using Photos  [41]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [42, 43] as described in Ref. [44]. The \({{B} ^0_{s}} \rightarrow J/\psi \phi \) simulated sample used in this analysis is generated taking into account the three possible polarization states of the \(\phi \) meson while S-wave contributions are not included.

3 Selection and mass fit

Events are first required to pass an online event selection performed by a trigger [45], which consists of a hardware stage, based on information from calorimeters and muon systems, followed by a software stage, which applies a full event reconstruction. At the hardware stage, events are required to have a muon with high \(p_{\mathrm {T}} \) or a hadron, photon or electron with high transverse-energy deposit in the calorimeters. A difference with respect to the previous analysis is that all the events passing any of the hardware trigger requirements are accepted. This increases the signal yield by 13% in 2015 and by 7% in 2016 with respect to using the muon system information only. The different signal gain in the two data taking years is due to tighter L0 trigger thresholds employed in the 2015 data. The subsequent software trigger consists of two separate stages. In the first stage, the events can be divided into two categories. In the first category, they are required to have two well-identified oppositely charged muons with invariant mass larger than 2700\(\mathrm {\,Me}{\mathrm {V\!/}c^2}\). This trigger has an almost uniform efficiency as a function of \({{B} ^0_{s}} \) decay time and will be referred to as unbiased. In the second category, events are retained if there is at least one muon with transverse momentum larger than about 1\(\mathrm {\,Ge}{\mathrm {V\!/}c}\) and with a large impact-parameter significance with respect to all PVs in the event. The latter is defined as the difference in the vertex-fit \(\chi ^2\) of the PV fitted with and without the considered track. Events are also included in the second category if they pass the selection by a multivariate algorithm that identifies a two-track good-quality secondary vertex with a large scalar sum of the \(p_{\mathrm {T}} \) of the associated charged particles and a significant displacement from the PVs. These triggers, whose selection thresholds changed slightly between 2015 and 2016 data taking, introduce a nontrivial dependence of the efficiency on the \({{B} ^0_{s}} \) decay time and will be referred to as biased. In the second stage of the trigger, events containing a \({\mu ^+} \mu ^-\) pair with invariant mass within 120\(\mathrm {\,Me}{\mathrm {V\!/}c^2}\) of the \(J/\psi \) mass [46] and which form a vertex that is significantly displaced from the PV are selected, introducing another small decay-time bias.

In the offline selection, the \(J/\psi \) meson candidates are formed from two oppositely charged particles, originating from a common vertex, which are identified as muons and which have \(p_{\mathrm {T}} \) larger than 500\(\mathrm {\,Me}{\mathrm {V\!/}c}\). The invariant mass of the \({\mu ^+} \mu ^-\) pair, \(m({\mu ^+} \mu ^-)\), must be in the range 3020–3170\(\mathrm {\,Me}{\mathrm {V\!/}c^2}\). The \(J/\psi \) meson candidates are combined with \({{K} ^+} {{K} ^-} \) candidates formed from two oppositely charged particles that are identified as kaons and that originate from a common vertex. The \({{K} ^+} {{K} ^-} \) pair is required to have \(p_{\mathrm {T}} \) larger than 500\(\mathrm {\,Me}{\mathrm {V\!/}c}\). The invariant mass of the \({{K} ^+} {{K} ^-} \) pair, \(m({{K} ^+} {{K} ^-})\), must be in the range 990–1050\(\mathrm {\,Me}{\mathrm {V\!/}c^2}\). The \({{B} ^0_{s}} \) candidates are reconstructed by combining the \(J/\psi \) candidate with the \({{K} ^+} {{K} ^-} \) pair, requiring that they form a good vertex and have an invariant mass, \(m(J/\psi {{K} ^+} {{K} ^-})\), in the range 5200–5550\(\mathrm {\,Me}{\mathrm {V\!/}c^2}\). The \({B} ^0_{s} \) origin vertex is defined as the PV in the interaction, or if multiple PVs are reconstructed the PV with the minimum value of the \({{B} ^0_{s}} \) impact parameter significance is associated with the candidate. The invariant mass is calculated from a kinematic fit that constrains the \({{B} ^0_{s}} \) candidate to originate from its origin vertex and constrains \(m({\mu ^+} \mu ^-)\) to the known \(J/\psi \) mass [46]. When deriving the decay time, t, and the helicity angles of the \({{B} ^0_{s}} \) candidate the origin vertex constraint is also applied. In addition, t is required to be in the range 0.3–15.0\({\mathrm { \,ps}}\), which suppresses a large fraction of prompt combinatorial background whilst having a negligible effect on the sensitivity to \(\phi _s\). The kinematic fit also estimates a per-candidate decay-time uncertainty, \(\delta _t\).

The selection is optimised with respect to the previous analysis [27] by means of a gradient-boosted decision tree (BDT) [47, 48], which is used to further suppress combinatorial background. To train the BDT, simulated \({{B} ^0_{s}} \rightarrow J/\psi \phi \) candidates are used as a signal sample and data candidates with \(m(J/\psi {{K} ^+} {{K} ^-})\) in the range 5450–5550\(\mathrm {\,Me}{\mathrm {V\!/}c^2}\) are used as a sample of combinatorial background. The simulation is corrected to match the distributions observed in data of particle identification variables, the \({{B} ^0_{s}} \) transverse momentum and pseudorapidity, the quality of the muon and kaon track fits and the number of tracks in an event with measurements both in the VELO and the tracking stations. Various input quantities are used in the BDT to exploit the features of the signal decay in order to distinguish it from background, namely the track-fit \(\chi ^2\) of the final-state particles, the particle identification probability as provided mainly from the RICH and muon systems, the quality of the candidate \(J/\psi \) and \({{B} ^0_{s}} \) decay vertices, the \(p_{\mathrm {T}} \) of the \({{B} ^0_{s}} \) candidate and of the \({{K} ^+} {{K} ^-} \) combination and the \({{B} ^0_{s}} \) IP with respect to its origin vertex. The selection requirement on the BDT output is chosen to maximise the effective signal sample size approximated by the square of the sum of sWeights divided by sum of squared sWeights.

In addition to combinatorial background, studies of the data in sidebands of the \(m(J/\psi {{K} ^+} {{K} ^-})\) spectrum show contributions from approximately 5200 \({{\varLambda } ^0_{b}} \rightarrow J/\psi p {{K} ^-} \) (350 \({{B} ^0} \rightarrow J/\psi {{K} ^+} {{\pi } ^-} \)) decays where the proton (pion) is misidentified as a kaon. These backgrounds lie around the \({{B} ^0_{s}} \) signal peak in the \(m(J/\psi {{K} ^+} {{K} ^-})\) distribution, as shown in Fig. 1. These contributions are suppressed using more stringent kaon identification requirements if the \(m(J/\psi {{K} ^+} {{K} ^-})\) mass, with the kaon interpreted as a proton (pion), lies within \(15\mathrm {\,Me}{\mathrm {V\!/}c^2} \) around the \({{\varLambda } ^0_{b}} \) (\({{B} ^0} \)) known mass [46]. This reduces the \({{B} ^0} \rightarrow J/\psi {{K} ^+} {{\pi } ^-} \) peaking background contribution to approximately 120 decays. This background is neglected and a systematic uncertainty is assigned to account for this approximation. The contribution due to the \({{\varLambda } ^0_{b}} \) background is \(1600\pm 160\), where the uncertainty includes statistical and systematic sources. The \({{\varLambda } ^0_{b}} \) background is statistically subtracted by inserting simulated \({{\varLambda } ^0_{b}} \) decays into the data sample with negative weights. This is done prior to the sPlot procedure, in which the combinatorial background is subtracted in a fit to \(m(J/\psi {{K} ^+} {{K} ^-})\). Correlations between the candidate mass and the angular variables are preserved and the simulated candidates are weighted such that the distributions of the kinematic variables used in the fit, and their correlations, match those of data.

Fig. 1
figure 1

Distribution of the invariant mass of \({B} ^0_{s} \) candidates, selected from simulated \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) (green filled area), \({{\varLambda } ^0_{b}} \rightarrow J/\psi p {{K} ^-} \) (solid red line) and \({{B} ^0} \rightarrow J/\psi {{K} ^+} {{\pi } ^-} \) (dotted blue line) decays. The distributions are weighted to correct differences in the kinematics and the resonance content between simulation and data

Fig. 2
figure 2

a Distribution of the invariant mass of selected \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) decays. The signal component is shown by the long-dashed red line, the background component by the dashed green line and the total fit function by the solid blue line. The background contribution due to \({{\varLambda } ^0_{b}} \rightarrow J/\psi p {{K} ^-} \) decays is statistically subtracted. The contribution from \({{B} ^0} \rightarrow J/\psi {{K} ^+} {{K} ^-} \)decays is not shown separately due to its small size. b Distribution of \({{K} ^+} {{K} ^-} \) invariant mass from selected \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) decays. The background is subtracted using the sPlot method. The dashed blue lines define the boundaries of the six \(m(K^{+}K^{-})\) bins that are used in the analysis

Figure 2a shows the \(m(J/\psi {{K} ^+} {{K} ^-})\) distribution and the result of an unbinned maximum-likelihood fit to the sample in the range 5200–5550\(\mathrm {\,Me}{\mathrm {V\!/}c^2}\). The sample is divided into 24 independent subsamples, corresponding to six bins in \(m({{K} ^+} {{K} ^-})\) with boundaries at 990, 1008, 1016, 1020, 1024, 1032, 1050\(\mathrm {\,Me}{\mathrm {V\!/}c^2}\), to the biased and the unbiased trigger categories, and to the year of data taking. The probability density function (PDF) used for the fit is independent for each of these subsamples and is composed of a single double-sided Crystal Ball (CB) [49] function for the signal and an exponential function for the combinatorial background. The CB tail parameters are fixed to those obtained from simulation.

The sPlot technique relies on the variable used for background subtraction to be uncorrelated with the variables to which the sWeights are applied. However, a correlation between the signal mass shape with \(\cos \theta _{\mu }\) is observed, due to the dependence of the mass resolution on the transverse momentum of the muons. The per-candidate mass uncertainty, \(\sigma _m\), obtained in the vertex and kinematic fit used to obtain \(m(J/\psi {{K} ^+} {{K} ^-})\), is found to represent a good proxy of \(\cos \theta _{\mu }\) due to its correlation with the \({B} ^0_{s} \) candidate mass resolution. Therefore, the signal function uses \(\sigma _m\) as a conditional observable. The width parameter \(\sigma _{\mathrm {CB}}\) of the double-sided CB function is parametrised as a quadratic function of the per-candidate mass uncertainty such that \(\sigma _{\mathrm {CB}} = a_1\sigma _m + a_2\sigma _m^2\), \(a_1\) and \(a_2\) are free parameters determined from the data. The quadratic dependence is motivated by simulation studies.

A small contribution from \({{B} ^0} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) background candidates is observed at the known \({{B} ^0} \) mass [46]. This contribution is included in the PDF and is modelled with a Gaussian distribution, where the mean is fixed to the fitted \({{B} ^0_{s}} \) mass minus the difference between \({{B} ^0_{s}} \) and \({{B} ^0} \) masses [46] and the resolution is fixed to 7\(\mathrm {\,Me}{\mathrm {V\!/}c^2}\), which is determined from a fit to the \({{B} ^0} \rightarrow J/\psi {{K} ^+} {{\pi } ^-} \) data control channel. Figure 2b shows the background-subtracted invariant-mass distributions of the \({{K} ^+} {{K} ^-} \) system in the selected \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) candidates. After the trigger and full offline selection requirements, the signal yield totals approximately 15 000 and 102 000 \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) decays in the 2015 and 2016 data sets, respectively.

The fraction of events containing more than one \({B} ^0_{s} \) candidate within the \(m(J/\psi K^{+}K^{-})\) range 5340–\(5400\mathrm {\,Me}{\mathrm {V\!/}c^2} \) is 0.3%. All candidates are retained in the subsequent stages of the analysis and a systematic uncertainty on the impact of allowing multiple candidates per event to be present in the analysis is assigned.

4 Decay-time resolution

The value and the uncertainty of the decay-time resolution strongly affects the relative precision on \(\phi _s\), thus the knowledge of the decay-time resolution calibration is pivotal. The resolution function is modelled with a Gaussian distribution with a mean of zero and a width \(\sigma _\mathrm{eff}\), where \(\sigma _\mathrm{eff}\) is determined using a sample of candidates constructed from combinations of \(J/\psi \), \({{K} ^+} \) and \({{K} ^-} \) candidates that originate predominantly in the primary interaction (prompt component). This sample is referred to as the prompt \(J/\psi {{K} ^+} {{K} ^-} \) sample. It is selected as described in Sect. 3 for \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) decays except for the lower limit requirement for the decay time, by making use of a different trigger line which is heavily prescaled.

The prompt component has zero decay time and is used to calibrate the detector resolution by studying the shape of the decay-time distribution around zero. This distribution is modelled by a delta function. In addition to the prompt component, there is a contribution at later decay times originating from \(J/\psi \) mesons produced in b-hadron decays, and a small fraction of a background due to candidates that have a decay time computed with respect to a wrong PV (wrong-PV component). The b-hadron component contributes to a tail at positive decay times and is described by two exponential functions. The shape of the wrong-PV component is determined from a data control sample in which the decay-time distribution of candidates is constructed by computing their decay time with respect to an independent PV from the following event. This contribution is found to be approximately \(0.5\%\) of the prompt sample.

The sum of the prompt and b-hadron components is convolved with a triple-Gaussian resolution function

$$\begin{aligned} \mathcal{R}(t) = \sum _{i=1}^{3} f_i \frac{1}{\sqrt{2\pi }\sigma _i} \exp \left[ -\frac{(t-\mu )^2}{2\sigma _i^2}\right] , \end{aligned}$$
(1)

where \(\sum _i f_i = 1\), \(\mu \) is a parameter that describes a bias in the decay time measurement and \(\sigma _i\) are the individual widths. The bias, \(\mu \), is assumed to be zero and a systematic uncertainty is assigned studying a possible deviation from this value. The rest of the parameters are determined from the fit.

Fig. 3
figure 3

a Decay-time distribution of the prompt \(J/\psi {{K} ^+} {{K} ^-} \) calibration sample with the result of an unbinned maximum-likelihood fit overlaid in blue. The overall triple-Gaussian resolution is represented by the dashed red line, while the two long-lived and the wrong-PV components are shown by the long-dashed-dotted and dashed-multiple-dotted brown and pink lines and the long-dashed purple line, respectively. b Variation of the effective single-Gaussian decay-time resolution, \(\sigma _\mathrm{eff}\), as a function of the estimated per-candidate decay-time uncertainty, \(\delta _t\), obtained from the prompt \(J/\psi {{K} ^+} {{K} ^-} \) sample. The red line shows the result of a linear fit. The data points are positioned at the barycentre of each \(\delta _t\) bin. The shaded histogram (see right y axis) shows the distribution of \(\delta _t\) in the background-subtracted \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) sample

The calibration sample is split into eleven subsets according to the per-candidate decay-time uncertainty, \(\delta _t\). The model is fit to the decay-time distribution in order to extract the parameters governing the decay-time resolution of Eq. (1) as shown in Fig. 3a. The dilution of the amplitude of the \({{B} ^0_{s}} \)-\({{\overline{B}{}} {}^0_{s}} \) oscillation due to the calibrated resolution is determined in each bin of \(\delta _t\) as

$$\begin{aligned} \mathcal {D} = \sum _{i=1}^{3}f_i \exp \left[ -\sigma _i^2\Delta m_s^2/2\right] , \end{aligned}$$
(2)

and is then used to evaluate an effective single-Gaussian width given by

$$\begin{aligned} \sigma _\mathrm{eff} = \sqrt{(-2/\Delta m_s^2)\ln \mathcal {D}}. \end{aligned}$$
(3)

This effective single-Gaussian resolution of width \(\sigma _\mathrm{eff}\) gives the same damping effect on the magnitude of the \({{B} ^0_{s}} \) meson oscillation as the triple-Gaussian model. Figure 3b shows the variation of \(\sigma _\mathrm{eff}\) as a function of \(\delta _t\). The variation is fit with a linear function \(\sigma _{\mathrm {eff}}(\delta _t) = b_0 + b_1\delta _t\) to determine the calibration parameters \(b_0 = 12.97 \pm 0.22\mathrm { \,fs} \) and \(b_1 = 0.846 \pm 0.006\), where the uncertainties are statistical only. A quadratic dependence is also evaluated and used as an alternate model to compute a systematic uncertainty. The calibration procedure is validated using simulated signal and prompt samples. The difference between the effective resolutions obtained in these simulated samples is approximately 0.8\(\mathrm { \,fs}\) and is treated as a source of systematic uncertainty.

The result of the calibration leads to an effective single-Gaussian resolution function averaged over the \(\delta _t\) bins with \(\sigma _\mathrm{eff} = 45.54\pm 0.04\pm 0.05\mathrm { \,fs} \), where the first uncertainty is statistical, and the second contribution comes from the uncertainties on the calibration parameters. This corresponds to a dilution \(\mathcal{D}=0.721\pm 0.001\) assuming \(\Delta m_s = 17.757\pm 0.021\) \({\mathrm { \,ps^{-1}}}\)  [32].

5 Decay-time efficiency

The selection and reconstruction efficiency depends on the \(B^0_s\) decay time due to displacement requirements made on the signal tracks and a decrease in reconstruction efficiency for tracks with large impact parameter with respect to the beam line [50]. The efficiency as a function of the decay time is determined using a new technique with respect to Ref. [27], exploiting the \({{B} ^0} \rightarrow J/\psi {{K} ^+} {{\pi } ^-} \) decay, with \(J/\psi \rightarrow \mu ^+ \mu ^-\), as a control sample. This control mode is kinematically similar to the signal decay. Since the decay-width difference between the two mass eigenstates in the \({{B} ^0} \) system is measured to be consistent with zero [46], \({{B} ^0} \rightarrow J/\psi {{K} ^+} {{\pi } ^-} \) candidates are assumed to have a purely exponential decay-time distribution with lifetime \(\tau ^{{{B} ^0}}_{\mathrm {data}} = 1.520{\mathrm { \,ps}} \) [32]. The \({B} ^0_{s} \) efficiency is determined via a simultaneous fit to background-subtracted data and simulated samples through the relation

$$\begin{aligned} \varepsilon ^{{{B} ^0_{s}}}_{\mathrm {data}}(t) = \varepsilon ^{{{B} ^0}}_{\mathrm {data}}(t) \times \frac{\varepsilon ^{{{B} ^0_{s}}}_{\mathrm {sim}}(t)}{\varepsilon ^{{{B} ^0}}_{\mathrm {sim}}(t)} , \end{aligned}$$
(4)

where \(\varepsilon ^{{{B} ^0}}_{\mathrm {data}}(t)\) is the efficiency of the control channel and \(\varepsilon ^{{{B} ^0_{s}}}_{\mathrm {sim}}(t)/ \varepsilon ^{{{B} ^0}}_{\mathrm {sim}}(t)\) is the ratio of efficiencies of simulated signal and reference decays after reconstruction and selection. Residual differences between either signal and control mode or data and simulation are automatically corrected for in the ratio of Eq. (4). In order to correct first-order differences between the \({{B} ^0_{s}} \) and \({{B} ^0} \) data samples, the latter is weighted to match the p and \(p_{\mathrm {T}} \) distribution of \({{B} ^0_{s}} \) data. In addition, both \({{B} ^0_{s}} \) and \({{B} ^0} \) simulated samples are weighted to match the \(p_{\mathrm {T}} \) distribution of the \({{B} ^0_{s}} \) data sample. The simulated samples are further corrected according to the ratio of the PDF used to generate them and the PDF obtained with the parameters measured in data [27, 51]. Together with an additional weighting to match the \(m({{K} ^+} {{\pi } ^-})\) and \(m({{K} ^+} {{K} ^-})\) distributions in data, this procedure reproduces the correct mixture of P- and S-waves in the \({{K} ^+} {{\pi } ^-} \) and \({{K} ^+} {{K} ^-} \) final state. The decay-time efficiency is obtained separately for the data-taking periods 2015 and 2016 and the two trigger categories.

The \({{B} ^0} \rightarrow J/\psi {{K} ^+} {{\pi } ^-} \) candidates are selected using trigger and preselection requirements similar to those of the \({{B} ^0_{s}} \rightarrow J/\psi K^+ K^-\) channel. The main difference is an additional selection on the pion-identification requirement, in order to reduce the probability of reconstructing two different \({{B} ^0} \) candidates by swapping the kaon and pion mass hypotheses. In addition, the \(p_{\mathrm {T}}\) of the pion is required to be larger than \(250\mathrm {\,Me}{\mathrm {V\!/}c} \) to reduce the number of multiple candidates per event to 0.5\(\%\) in the \(m(J/\psi K^+ \pi ^-)\) region 5260–\(5300\mathrm {\,Me}{\mathrm {V\!/}c^2} \). The invariant mass of the kaon-pion pair is required to be in the range 826–\(966\mathrm {\,Me}{\mathrm {V\!/}c^2} \). The BDT as trained and optimised on the signal channel is used, applying the same selection requirement. Several potential peaking backgrounds arising from the misidentification of particles are considered but they are all found to be negligible. A small contribution from \(B^0_s \rightarrow J/\psi K^+ \pi ^-\) decays is removed by selecting candidates with \(m(J/\psi K^+ \pi ^-)<5350\mathrm {\,Me}{\mathrm {V\!/}c^2} \).

Fig. 4
figure 4

Distribution of the invariant mass of selected a \({{B} ^0} \rightarrow J/\psi K^+ \pi ^-\) and b \(B^+ \rightarrow J/\psi K^+\) decays used for the calibration and validation of the decay-time efficiency. The signal component is shown by the long-dashed red line, the background component by the dashed green line and the total fit function by the solid blue line

Figure 4a shows the \(m(J/\psi K^+ \pi ^-)\) distribution and corresponding result of an unbinned maximum-likelihood fit to the sample. The model used for the fit is the same for the 2015 and 2016 data-taking periods and the two trigger categories but with independently fitted parameters. It is composed of a Hypatia [52] function for the signal, where the parameters describing the tails are fixed to the values obtained from simulation, and an exponential function for the combinatorial background. In total, 75,000 and 480,000 \({{B} ^0} \) mesons are found in 2015 and 2016, respectively. The result of this fit is used to statistically subtract the background when determining the decay-time efficiency in data, by using weights computed with the sPlot technique.

The PDF used to describe the decay-time distribution of the \({{B} ^0} \) data, and of the \({{B} ^0_{s}} \) and \({{B} ^0} \) simulated samples is composed of the product of the efficiency function and a single exponential function, convolved with a single Gaussian resolution function centred at zero. For the \({{B} ^0} \) candidates, the width of the resolution function is set to 39\(\mathrm { \,fs}\) and 42\(\mathrm { \,fs}\) for the simulated and data samples, respectively. The first value is obtained from simulation, and the second value is obtained by scaling the \({{B} ^0_{s}} \) resolution obtained in data, as described in Sect. 4, by the ratio seen between the \({{B} ^0} \) and \({{B} ^0_{s}} \) resolutions in simulated samples. A \({{B} ^0_{s}} \) simulated sample is generated with \(\Delta \Gamma _s=0{\mathrm { \,ps^{-1}}} \) and thus a single exponential function is used to determine \(\varepsilon _{\mathrm {sim}}^{{{B} ^0_{s}}}\). As a cross-check, the decay-time efficiency is also derived from the nominal \({{B} ^0_{s}} \rightarrow J/\psi \phi \) simulated sample, weighted to have \(\Delta \Gamma _s = 0{\mathrm { \,ps^{-1}}} \) such that the same fitting strategy can be used as defined above. The difference between these two strategies is considered as a source of systematic uncertainty.

The efficiency functions are parametrised using cubic splines with nodes at 0.3, 0.58, 0.91, 1.35, 1.96, 3.01, 7.00\({\mathrm { \,ps}}\) and the first coefficient fixed to unity. The node positions are defined as to create six uniformly populated bins in the interval 0.3–\(15{\mathrm { \,ps}} \), assuming an exponential distribution with \(\Gamma = 0.66 {\mathrm { \,ps}} ^{-1}\). The position of the last node is chosen due to the lack of candidates at large decay times in the 2015 data control sample. The final decay-time efficiencies, \(\varepsilon ^{{{B} ^0_{s}}}_{\mathrm {data}}(t)\), are shown in Fig. 5. The structure around 1\({\mathrm { \,ps}}\) visible in Fig. 5a, c is due to the different definition of the origin vertex used in the trigger and in the offline selection.

Fig. 5
figure 5

Decay-time efficiency for the a 2015 unbiased, b 2015 biased, c 2016 unbiased and d 2016 biased \({{B} ^0_{s}} \rightarrow J/\psi \phi \) sample. The cubic-spline function described in the text is shown by the blue line. For comparison, the black points show the efficiency when computed using histograms for each of the input component efficiencies

The full procedure is validated in data using two approaches where the \({B} ^0_{s} \) samples are replaced with alternative \(B \) meson samples of known lifetime. First, a sample of approximately 1.6 million \(B^+ \rightarrow J/\psi (\rightarrow {\mu ^+} \mu ^-) K^+\) candidates is reconstructed in the same data set as the \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) candidates and selected using similar selection requirements. The mass distribution of these candidates is shown in Fig. 4b. This sample is used to measure the difference of the \({{B} ^+} \) and \({{B} ^0} \) decay widths, \(\Gamma _u-\Gamma _d\), with the same methods used for the measurement of \(\Gamma _s-\Gamma _d\). A simulated sample of \({{B} ^+} \) decays is used in the calculation of the numerator of Eq. (4) and this sample is corrected such that the particle-identification, event-multiplicity and other kinematic and selection variables match those in data. The measured difference of decay widths is \(\Gamma _u-\Gamma _d = -0.0478 \pm 0.0013{\mathrm { \,ps^{-1}}} \), where the uncertainty is statistical only. This is in agreement with the world average value, \(-0.0474 \pm 0.0023{\mathrm { \,ps^{-1}}} \) [46], and validates the measurement of \(\Gamma _s-\Gamma _d\) with a precision of 0.003\({\mathrm { \,ps^{-1}}}\).

A similar test is done using the \({{B} ^0} \rightarrow J/\psi {{K} ^+} {{\pi } ^-} \) decays both as the signal and the reference to measure a null decay-width difference. The sample is split into two independent sets according to different selection criteria, where one is used to evaluate the decay-time efficiency with the procedure defined above, and the other is used as the signal sample. In all cases, the measured decay-width difference is found to be consistent with zero with a precision around 0.003\({\mathrm { \,ps^{-1}}}\).

6 Angular efficiency

The LHCb detector geometry and the selection requirements give rise to efficiencies that vary as a function of the helicity angles \(\theta _K\), \(\theta _\mu \) and \(\phi _h\). The three-dimensional angular-efficiency correction is determined from simulated signal events to which the same trigger and selection criteria as in the data are applied. The efficiency is evaluated separately for the different years of data taking and for the two trigger categories. Two sets of corrections are applied to the simulated events such that they match the data. First, the simulated samples are weighted, using a boosted decision tree method [53], to match the \(p_{\mathrm {T}}\), p and \(m({{K} ^+} {{K} ^-})\) distributions of the \({B} ^0_{s} \) signal. A second procedure is performed to correct the differences observed in the kinematic distributions of the final-state particles and the fact that the simulated events do not include \({{K} ^+} {{K} ^-} \) pairs in an S-wave configuration. This correction is implemented as an iterative procedure that gradually modifies the simulation such that the S-wave fraction matches the value measured in the data. As a result, the agreement of the kaon momentum and \(p_{\mathrm {T}}\) distributions between the simulation and the data is improved. The efficiencies as a function of the three helicity angles are shown for illustration in Fig. 6. The angular efficiency correction is introduced in the analysis through normalisation weights in the PDF describing the signal decays in the fit of Sect. 8, following the procedure described in Ref. [54]. The weights are calculated using simulated candidates and their statistical uncertainties are propagated to the parameters of interest as a systematic uncertainty.

Fig. 6
figure 6

Normalised angular efficiency as a function of a \(\cos \theta _K\), b \(\cos \theta _\mu \) and c \(\phi _h\), where in all cases the efficiency is integrated over the other two angles. The efficiency is evaluated using simulated \({{B} ^0_{s}} \rightarrow J/\psi \phi \) decays that have been weighted to match the kinematics and physics of \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) decays in data, as described in the text. The points are obtained by dividing the angular distribution in the simulated sample by the distribution expected without any efficiency effect and the curves represent an even fourth-order polynomial parameterisation of each one-dimensional efficiency. The figure is for illustration only as the angular efficiency is accounted for by normalisation weights in the signal PDF

A cross-check of the angular efficiency procedure is made using the \({{B} ^0} \rightarrow J/\psi {{K} ^+} {{\pi } ^-} \) data and \({{B} ^0} \rightarrow J/\psi K^*(892)^0(\rightarrow K^{+}\pi ^{-})\) simulated samples. The simulation contains \(K^{+}\pi ^{-}\) systems in P-wave only and is corrected to match the kinematic distributions of the data using the iterative method defined above and the angular-efficiency weights are determined. The P- and S-wave \({{B} ^0} \rightarrow J/\psi K^{+}\pi ^{-}\) polarisation amplitudes are measured by means of an unbinned fit to the distribution of helicity angles of the final-state particles and found to be consistent with those in Ref. [55].

Another high-precision test of the angular-efficiency correction is made by using the large sample of \({{B} ^+} \rightarrow J/\psi {{K} ^+} \) decays presented in Sect. 5. In \({{B} ^+} \rightarrow J/\psi {{K} ^+} \), the helicity angle \(\theta _\mu \) distribution follows a \(1-\cos ^2\theta _\mu \) dependence. The \({{B} ^+} \) data sample is split into nine disjoint subsets according to the pseudorapidity of the \({{B} ^+} \) meson, to check the large efficiency variation as a function of this quantity. In each subset, background is subtracted with the sPlot technique using the \({{B} ^+} \) candidate mass as a discriminating variable. Prior to any angular efficiency correction, the \(\theta _\mu \) distribution presents up to a 30% deviation from the expected shape, three times larger than in \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) decays. However, when the \({{B} ^+} \rightarrow J/\psi {{K} ^+} \) simulation is used to correct the data with the same method used for this analysis, a fit of the background-subtracted and efficiency-corrected data demonstrates that the expected distribution is fully recovered in each bin, with an overall precision of about 0.1\(\%\). The test is stable against variation of the binning of the \({{B} ^+} \) sample and choice of different variables used to correct the simulation to match the data with respect to the baseline strategy.

7 Tagging the \({B} ^0_{s} \) meson flavour at production

The determination of the initial flavour of the \({B} ^0_{s} \) meson, called tagging, is a fundamental component for measuring \(CP\) asymmetries in the decays of \({B} ^0_{s} \) mesons to \(CP\) eigenstates. Two classes of algorithms are used. The opposite side (OS) tagger exploits the fact that b and \(\overline{b}\) quarks are almost exclusively produced in pairs in pp collisions, allowing the flavour of the signal \({B} ^0_{s} \) candidate to be inferred from the flavour of the other b hadron in the event. The OS tagger combines information on the charge of the muon or electron from semileptonic b decays, the charge of the kaon from the \(b\rightarrow c\rightarrow s\) decay chain, the charge of a reconstructed secondary charm hadron and the charges of the tracks that form the secondary vertex of the other b-hadron decay, combined into a weighted average, with weights depending on the transverse momenta of the tracks. The same-side kaon (SSK) tagger exploits the additional correlated kaon that tends to be produced during the hadronisation of the \(\overline{{b}}\) (\(b \)) quark that forms the signal \({B} ^0_{s} \) (\({\overline{B}{}} {}^0_{s} \)) candidate, with its initial flavour identified by the kaon charge. These flavour tagging algorithms have been revisited and optimised using Run 2 data [56], obtaining significantly higher combined tagging performances with respect to Run 1. Further details on the OS and SSK taggers can be found in Refs. [57,58,59].

The tagging algorithms each provide a flavour-tagging decision, \(\mathfrak {q}\), and an estimate, \(\eta \), of the probability that the decision is incorrect (mistag) for each reconstructed \({B} ^0_{s} \) candidate. The tagging decision takes the value \(+1\) (\(-1\)) for each tagged \({B} ^0_{s} \) (\({\overline{B}{}} {}^0_{s} \)) candidate and 0 if the taggers cannot make a decision (untagged). The mistag probability is defined in the range from 0 to 0.5, since \(\eta > 0.5\) corresponds to the opposite decision with a mistag of \((1-\eta )\). For untagged events \(\eta \) is 0.5.

Each tagging algorithm is implemented as a BDT that is trained and optimised using large samples of simulated b-hadron decays for the SSK tagger and a large data sample of \({{B} ^+} \rightarrow J/\psi {{K} ^+} \) decays for the OS tagger. The mistag probability for each tagger is given by the output of the BDT, which is calibrated using dedicated data control channels to relate \(\eta \) to the true mistag probability, \(\omega \), as described in the following sections. Each tagger has a corresponding tagging power given by \(\epsilon _\mathrm{tag}D^2\), where \(\epsilon _\mathrm{tag}\) is the fraction of tagged candidates and \(D=1-2\omega \) is the dilution induced on the amplitude of the \({B} ^0_{s} \) oscillation. The tagging power represents the effective reduction in statistical power due to imperfect tagging.

7.1 Opposite-side tagging

The OS tagging algorithm is calibrated using the sample of \({{B} ^+} \rightarrow J/\psi {{K} ^+} \) decays (Sect. 5), whose flavour is determined by the kaon charge. This sample of \({{B} ^+} \rightarrow J/\psi {{K} ^+} \) decays is independent of that used to train and optimise the BDT of the tagging algorithm. The result of the fit to the distribution of \(m(J/\psi {{K} ^+})\) shown in Fig. 4b is used to compute sWeights, which are applied in subsequent stages of the analysis to subtract the background. The \({{B} ^+} \rightarrow J/\psi {{K} ^+} \) sample is further weighted to match the background-subtracted \({{B} ^0_{s}} \rightarrow J/\psi \phi \) sample in the distributions of charged-track and PV multiplicities and the \(p_{\mathrm {T}}\) and rapidity of the \({B} \) meson.

Fig. 7
figure 7

Calibration of the OS tagger using \({{B} ^+} \rightarrow J/\psi {{K} ^+} \) decays. The black points show the average measured mistag probability, \(\omega \), in bins of predicted mistag, \(\eta \), the red line shows the calibration as described in the text and the yellow area the calibration uncertainty within one standard deviation. The shaded histogram shows the distribution, with arbitrary normalisation, of \(\eta \) in the background subtracted \({{B} ^0_{s}} \rightarrow J/\psi \phi \) sample, summing over candidates tagged as \({{B} ^0_{s}} \) or \({{\overline{B}{}} {}^0_{s}} \)

The calibration between true and estimated mistag for each tagging algorithm is given by an empirically determined linear relationship,

$$\begin{aligned} \omega (\eta )= & {} \left( p_0 + \frac{\Delta p_0}{2}\right) + \left( p_1 + \frac{\Delta p_1 }{2}\right) \left( \eta - \langle \eta \rangle \right) \end{aligned}$$
(5)
$$\begin{aligned} \overline{\omega }(\eta )= & {} \left( p_0 - \frac{\Delta p_0}{2}\right) + \left( p_1 - \frac{\Delta p_1 }{2}\right) \left( \eta - \langle \eta \rangle \right) , \end{aligned}$$
(6)

where \(\omega (\eta )\) and \(\overline{\omega }(\eta )\) are the calibrated mistag probabilities for \(B^+\) and \(B^-\) mesons, respectively, \(\Delta p_{0,1}\) are mistag asymmetries and \(\langle \eta \rangle \) is the average estimated mistag of the \({{B} ^+} \rightarrow J/\psi {{K} ^+} \) sample. The calibration parameters are determined from an unbinned maximum-likelihood fit to the \(\eta \) distribution of the probability

$$\begin{aligned} \mathcal {P}(a|\eta ) = (1-a)\overset{(-)}{\omega }(\eta ) + a(1-\overset{(-)}{\omega }(\eta )), \end{aligned}$$
(7)

for an initial flavour of the \(B^+\) (\(B^-\)) meson. The discrete variable a has the value 0 or 1 for an incorrect or correct tagging decision, respectively, based upon comparing the decision \(\mathfrak {q}\) to the kaon charge. Figure 7 shows the relation between the flavour-averaged value of \(\omega \) and \(\eta \) determined by the fit and the values of the measured mistag in bins of estimated mistag, which supports the use of a linear calibration function. The final calibration parameters are given in Table 1 and the overall tagging power for candidates with an OS tag only can be found in Table 2. Differences of the tagging efficiency are expected to be negligible as their effects are washed out by the fast \({{B} ^0_{s}} \)\({{\overline{B}{}} {}^0_{s}} \) oscillations. The applicability of the calibration from \({{B} ^+} \rightarrow J/\psi {{K} ^+} \) to \({{B} ^0_{s}} \rightarrow J/\psi \phi \) decays is tested using simulated samples and observed differences between the calibration parameters are treated as a source of systematic uncertainty. Variations in the parameters caused by the use of a different model for the combinatorial background in the fit to the \(m(J/\psi {{K} ^+})\) distribution are found to be negligible.

Table 1 Calibration parameters for the OS and SSK taggers. Where given, the first uncertainty is statistical and the second is systematic
Table 2 Overall tagging performance for \({{B} ^0_{s}} \rightarrow J/\psi K^{+}K^{-}\). The uncertainty on \(\epsilon _\mathrm{tag}D^{2}\) is obtained by varying the tagging calibration parameters within their statistical and systematic uncertainties summed in quadrature

7.2 Same-side tagging

Fig. 8
figure 8

Distribution of the invariant mass of selected \({{B} ^0_{s}} \rightarrow {{D} ^-_{s}} \pi ^+\) candidates (black points). The total fit function is shown as the solid blue line. The signal component is shown by the red long-dashed line, the combinatorial background by the light-blue short-dashed line and other small background components are also shown as specified in the legend. Only the dominant backgrounds are shown

The SSK tagger is calibrated by resolving the \({{B} ^0_{s}} \)-\({{\overline{B}{}} {}^0_{s}} \) flavour oscillations in a sample of flavour-specfic \({{B} ^0_{s}} \rightarrow {{D} ^-_{s}} {{\pi } ^+} \) decays. The amplitude of this oscillation is related to the averaged \({B} ^0_{s} \)\({\overline{B}{}} {}^0_{s} \) mistag probability, \(\tilde{\omega }\), via the PDF of the decay-time distribution of flavour-tagged \({{B} ^0_{s}} \rightarrow {{D} ^-_{s}} {{\pi } ^+} \) decays, given by

$$\begin{aligned} \mathcal {P}(t)&= \epsilon (t) \left[ \Gamma (t) \otimes R(t-t^\prime )\right] , \nonumber \\ \Gamma (t)&= \Gamma _s e^{-\Gamma _s t} \left[ \cosh (\Delta \Gamma _s t /2) \right. \nonumber \\&\quad \left. + q^\mathrm{mix} (1-2 \tilde{\omega } (\eta )) \cos ( \Delta m_s t) \right] , \end{aligned}$$
(8)

where \(t^\prime \) and t are the true and reconstructed decay time of the \({{B} ^0_{s}} \) meson, respectively, and \(\Gamma (t)\) is the \({{B} ^0_{s}} \) decay rate. The decay time and the decay-time uncertainty are estimated from a kinematic fit [60] in which the \({{D} ^-_{s}} \pi ^+\) candidate is constrained to originate from the PV. The decay-time efficiency is empirically parameterised as \(\epsilon (t) =\) \(1 - 1/(1+ (at)^n + b)\), and \(R(t-t^\prime )\) is the decay-time resolution model. Here \(q^\mathrm{mix} = +1\) \((-1)\) if the \({{B} ^0_{s}} \) meson has (has not) changed flavour between its production and decay, determined by comparing the flavour-tagging decision and charge of the pion. A linear relationship between the true and estimated mistag probabilities is assumed, as given in Eq. (5).

Approximately \(70\,000\) same-side flavour-tagged \({{B} ^0_{s}} \rightarrow {{D} ^-_{s}} \pi ^+\) decays, with \({{D} ^-_{s}} \rightarrow {{K} ^+} {{K} ^-} \pi ^-\), are selected with similar requirements as in Ref. [59]. Due to trigger requirements, only candidates with \(p_{\mathrm {T}} ({{B} ^0_{s}})\) larger than \(2\mathrm {\,Ge}{\mathrm {V\!/}c^2} \) are used to perform the calibration. Figure 8 shows the distribution of \(m({{D} ^-_{s}} \pi ^+)\) for the selected sample. Superimposed is the result of a fit with a model composed of a signal contribution described by a Hypatia with tail parameters fixed to those from simulation and a combinatorial background component modelled by an exponential function. In addition, template shapes for several peaking backgrounds (\({{B} ^0_{s}} \rightarrow {{D} ^{\pm }_{s}} K^{\mp }\), \({{B} ^0} \rightarrow {{D} ^-_{s}} \pi ^+\), \({{\overline{\varLambda }} {}^0_{b}} \rightarrow {{\overline{\varLambda }} {}^-_{c}} \pi ^+\), \({{B} ^0_{s}} \rightarrow {{D} ^{*-}_{s}} \pi ^+\) and \({{B} ^0_{s}} \rightarrow {{D} ^-_{s}} \rho ^+\)) are evaluated from simulation and included in the fit model. The yield of the peaking backgrounds is determined from a fit to \(m({{D} ^-_{s}} \pi ^+)\) in the mass range 5100–\(5600\mathrm {\,Me}{\mathrm {V\!/}c^2} \). Using the fit results, the yield is extrapolated to the narrower region 5300–\(5600\mathrm {\,Me}{\mathrm {V\!/}c^2} \) and fixed in the subsequent \(m({{D} ^-_{s}} \pi ^+)\) fit, which is used to compute sWeights for background subtraction as in the OS calibration. The \({{B} ^0_{s}} \rightarrow {{D} ^-_{s}} \pi ^+\) sample is also weighted to match the background-subtracted \({{B} ^0_{s}} \rightarrow J/\psi \phi \) sample in the distributions of charged-track and PV multiplicities and the \(p_{\mathrm {T}}\) and rapidity of the \({{B} ^0_{s}} \) meson.

Fig. 9
figure 9

Variation of the effective single-Gaussian decay-time resolution, \(\sigma _\mathrm{eff}\), as a function of the estimated per-event decay-time uncertainty, \(\delta _t\), obtained from the prompt \({{D} ^-_{s}} \pi ^+\) sample. The red line shows the result of a linear fit to the data and the yellow band its uncertainty within one standard deviation

To calibrate the decay-time resolution in Eq. (8), a sample of promptly produced \({{D} ^-_{s}} \pi ^+\) candidates is selected following the requirements defined in Ref. [61]. The procedure to obtain the calibration for the decay-time resolution is similar to that described in Sect. 4. An unbinned maximum-likelihood fit is made to the \({{D} ^-_{s}} \) candidate invariant-mass distribution in 18 bins of \(\delta _t\). The model consists of a single Gaussian component for the signal and a second-order polynomial for the combinatorial background. From this fit, sWeights are computed that are used to subtract the background contribution in an unbinned fit to the decay-time distribution in each \(\delta _t\) bin. The model for this fit is composed of two Gaussian functions with a common mean and different widths. Only candidates with reconstructed decay time in the range from \(-1.0\) to \(0.1{\mathrm { \,ps}} \) are fitted. At such low values the longer-lived background components can be neglected. The effective single-Gaussian resolution is calculated from the double-Gaussian model using Eqs. (2) and (3). The variation of the effective resolution with the average value of \(\delta _t\) in each bin is shown in Fig. 9. From a binned fit using a linear calibration function, \(\sigma _\mathrm{eff}(\delta _t) = c_0 + c_1\delta _t\), the calibration constants are determined to be \(c_0 = 18.8 \pm 1.0\) \(\mathrm { \,fs}\) and \(c_1 = 1.03 \pm 0.02\), where the uncertainties are statistical only. Applying a similar procedure to a sample of simulated \({{B} ^0_{s}} \rightarrow {{D} ^-_{s}} \pi ^+\) decays indicates a difference in the calibration parameters between prompt \({{D} ^-_{s}} \pi ^+\) candidates and \({{B} ^0_{s}} \rightarrow {{D} ^-_{s}} \pi ^+\) decays. A systematic uncertainty of 0.1 is assigned to \(c_1\) to account for this difference.

Fig. 10
figure 10

a Distribution of the decay time for \({{B} ^0_{s}} \rightarrow {{D} ^-_{s}} \pi ^+\) candidates tagged as mixed and unmixed with the projection of the fit result, which is described in the text. b Calibration of the SSK tagger using \({{B} ^0_{s}} \rightarrow {{D} ^-_{s}} \pi ^{+}\) decays. The black points show the average measured mistag probability, \(\omega \), in bins of predicted mistag, \(\eta \), the red line shows the calibration obtained from the fit described in the text, and the yellow area the calibration uncertainty within one standard deviation. The shaded histogram shows the distribution of \(\eta \) in the background subtracted \({{B} ^0_{s}} \rightarrow J/\psi \phi \) sample

To determine the SSK tagger calibration parameters from the \({{B} ^0_{s}} \rightarrow {{D} ^-_{s}} \pi ^+\) decay candidates, an unbinned maximum-likelihood fit, which uses the PDF of Eq. (8), is performed. Uncertainties due to the use of external measurements [46] of \(\Gamma _s\), \(\Delta \Gamma _s\), \(\Delta m_s\), and the decay-time resolution parameters (\(c_0\), \(c_1\)) are accounted for via Gaussian constraints in the likelihood function. The parameters of the SSK tagger calibration and decay-time efficiency are free in the fit. Figure 10 shows the result of this fit, split by decays that are tagged as being mixed or unmixed, and the obtained relation between \(\omega \) and \(\eta \). Also shown are the values of the measured mistag in bins of estimated mistag, which supports the use of a linear calibration function for \(\omega (\eta )\). The final calibration parameters are given in Table 1. Two sources of systematic uncertainty are studied in addition to the knowledge of the time resolution, which is incorporated in the statistical uncertainty. The first and larger one is due to the applicability of the calibration from \({{B} ^0_{s}} \rightarrow {{D} ^-_{s}} \pi ^+\) to \({{B} ^0_{s}} \rightarrow J/\psi \phi \) decays. It is tested using simulated events and the observed difference between the calibrations is assigned as a systematic uncertainty. Variations in the parameters through the use of a different model for the combinatorial background in the fit to the \({{B} ^0_{s}} \rightarrow {{D} ^-_{s}} \pi ^+\) invariant-mass distribution are treated as systematic uncertainties. The tagging asymmetry parameters, \(\Delta p_0\) and \(\Delta p_1\), are both assumed to be \(0.00\pm 0.03\). The uncertainty is estimated by studying the tagging calibration using a sample of over 3.1 million promptly produced \({{D} ^-_{s}} \rightarrow {{K} ^+} {{K} ^-} \pi ^-\) decays, with the method described in Ref. [59]. The overall tagging power for \({{B} ^0_{s}} \rightarrow J/\psi K^{+}K^{-}\) candidates with only an SSK tag can be found in Table 2.

7.3 Tagger combination

Approximately 31% of the tagged candidates in the \({{B} ^0_{s}} \rightarrow J/\psi K^{+}K^{-}\) sample are tagged by both the OS and the SSK algorithms. Since the algorithms are uncorrelated, as they select mutually exclusive charged particles, the two tagging results are combined taking into account both decisions and their corresponding estimate of \(\eta \). The combined estimated mistag probability and the corresponding uncertainties are obtained by combining the individual calibrations for the OS and SSK tagging and propagating their uncertainties. The effective tagging power and efficiency for these both OS and SSK tagged candidates is given in Table 2.

8 Maximum-likelihood fit

The maximum-likelihood fitting procedure is similar to that in Ref. [27], the only major differences being the treatment of the decay-time efficiency and that the quantity \({\Gamma _{{s}}}-{\Gamma _{{d}}} \) is measured instead of \(\Gamma _s\). It has been checked via pseudoexperiments that, given that the decay-time efficiency is obtained using \(\Gamma _d\) as an input parameter (see Sect. 5), the fitted value of \({\Gamma _{{s}}}-{\Gamma _{{d}}} \) and its uncertainty are independent of the value and uncertainty of \(\Gamma _d\). This strategy has the advantage that the measured value of \({\Gamma _{{s}}}-{\Gamma _{{d}}} \) can be combined with the most up-to-date value of \(\Gamma _d\) to obtain \(\Gamma _s\) or \(\Gamma _s/\Gamma _d\).

Each candidate i is given a signal weight \(W_i\) using the sPlot method with \(m(J/\psi {{K} ^+} {{K} ^-})\) as a discriminating variable and \(\sigma _m\) as a conditional variable as explained in Sect. 3. A weighted fit is then performed to the \({B} ^0_{s} \) decay time and helicity-angle distributions using a PDF that describes only the signal. The log-likelihood in each of the 24 data subsamples is scaled by a per-sample factor \(\alpha = \sum _i W_i / \sum _i W_i^2 \) to account for the effect of the weights in the determination of the parameter uncertainties [34].

The distribution of the decay time and angles for a \({B} ^0_{s} \) meson produced at time \(t=0\) is described by a sum of ten terms, corresponding to the four polarisation amplitudes squared and their interference terms. Each of these is given by the product of a decay-time-dependent function and an angular function

$$\begin{aligned} \frac{\mathrm {d}^{4} \Gamma ({{B} ^0_{s}} \rightarrow J/\psi K^{+}K^{-}) }{\mathrm {d}t \;\mathrm {d}\Omega } \; \propto \; \sum ^{10}_{k=1} \, N_{k} \, h_k(t) \, f_k( \Omega ), \end{aligned}$$
(9)

with

$$\begin{aligned} h_k(t|B^{0}_{s})&= \frac{3}{4\pi }e^{-\Gamma t}\left( a_{k}\cosh \frac{\Delta \Gamma t}{2} + b_{k}\sinh \frac{\Delta \Gamma t}{2} \right. \nonumber \\&\quad \left. + c_{k}\cos (\Delta m t) + d_{k}\sin (\Delta m t) \right) , \end{aligned}$$
(10)
$$\begin{aligned} h_k(t|\bar{B}^{0}_{s})&= \frac{3}{4\pi }e^{-\Gamma t}\left( a_{k}\cosh \frac{\Delta \Gamma t}{2} + b_{k}\sinh \frac{\Delta \Gamma t}{2} \right. \nonumber \\&\quad \left. - c_{k}\cos (\Delta m t) - d_{k}\sin (\Delta m t) \right) , \end{aligned}$$
(11)

where the definition of the parameters \(N_{k}\), \(a_{k}\), \(b_{k}\), \(c_{k}\), \(d_{k}\) and of the function \(f_k( \Omega )\) can be found in Table 3.

Table 3 Angular and time-dependent functions used in the fit to the data. Abbreviations used include \(c_{K}=\cos \theta _K\), \(s_{K}=\sin \theta _K\), \(c_{l}=\cos \theta _l\), \(s_{l}=\sin \theta _l\), \(c_{\phi }=\cos \phi \) and \(s_{\phi }=\sin \phi \)

The interference between the different S- and P-wave contributions is accounted for via an effective coupling factor, \(C_\mathrm{SP}\). The \(C_\mathrm{SP}\) factors are computed by integrating the interference between the S- and P-wave contributions in each of the six \(m({{K} ^+} {{K} ^-})\) bins in which the analysis is performed, using the same strategy as in the previous analysis. They are applied by multiplication to the relevant terms in Eq. (9). The \(C_\mathrm{SP}\) factors are unity for terms involving P-wave and S-wave amplitudes only (\(k<8\)). In the determination of the \(C_\mathrm{SP}\) factors, the \(m({{K} ^+} {{K} ^-})\) lineshape of the P-wave component is described by a relativistic Breit–Wigner distribution, while the S-wave is taken as an \(f_0(980)\) resonance modelled as a Flatté amplitude with parameters from Ref. [62]. The \(C_{\mathrm {SP}}\) correction factors are calculated to be 0.8463, 0.8756, 0.8478, 0.8833, 0.9415 and 0.9756 from the lowest to the highest \(m({{K} ^+} {{K} ^-})\) bin. Their effect on the fit results is small and is discussed further in Sect. 9, where three different S-wave lineshapes are considered to assign a systematic uncertainty. The PDF considers four disjoint tagging cases: only OS tagged candidates, only SSK-tagged, OS and SSK tagged, and untagged candidates. Taking into account all detector response effects, the full PDF is conditional upon the mistag probability and the estimated decay-time uncertainty.

A simultaneous fit is made to the different subsamples, divided by \(m({{K} ^+} {{K} ^-})\) bin, year of data taking and trigger category. The PDF for each subsample, up to a normalisation constant, is given by

$$\begin{aligned}&\mathcal{P}\left( t,\Omega | \mathfrak {q}^\mathrm{OS}, \mathfrak {q}^\mathrm{SSK}, \eta ^\mathrm{OS}, \eta ^\mathrm{SSK}, \delta _{t} \right) \nonumber \\&\quad \propto \sum _{k=1}^{10} C^{k}_\mathrm{SP} N_{k} f_{k}(\Omega )\varepsilon ^{{{B} ^0_{s}}}_\mathrm{data}(t) \nonumber \\&\quad \cdot \left\{ \left[ \mathcal{Q}\left( \mathfrak {q}^\mathrm{OS}, \mathfrak {q}^\mathrm{SSK}, \eta ^\mathrm{OS}, \eta ^\mathrm{SSK}\right) h_k\left( t | {{B} ^0_{s}} \right) \right. \right. \nonumber \\&\quad \left. \left. + \bar{\mathcal{Q}}\left( \mathfrak {q}^\mathrm{OS}, \mathfrak {q}^\mathrm{SSK}, \eta ^\mathrm{OS}, \eta ^\mathrm{SSK}\right) \right. h_k\left( t | {{\overline{B}{}} {}^0_{s}} \right) \right] \otimes \left. \mathcal{R}\left( t -t' | \delta _{t}\right) \right\} , \end{aligned}$$
(12)

where \(\mathcal{R}\) is the time resolution function defined in Eq. (1) and the terms

$$\begin{aligned} \mathcal{Q}\left( \mathfrak {q}^\mathrm{OS}, \mathfrak {q}^\mathrm{SSK}, \eta ^\mathrm{OS}, \eta ^\mathrm{SSK}\right)&= \left[ 1+\mathfrak {q}^\mathrm{OS}\left( 1-2\omega \left( \eta ^\mathrm{OS}\right) \right) \right] \nonumber \\&\qquad \times \left[ 1+\mathfrak {q}^\mathrm{SSK}\left( 1-2\tilde{\omega }\left( \eta ^\mathrm{SSK}\right) \right) \right] , \end{aligned}$$
(13)
$$\begin{aligned} \bar{\mathcal{Q}}\left( \mathfrak {q}^\mathrm{OS}, \mathfrak {q}^\mathrm{SSK}, \eta ^\mathrm{OS}, \eta ^\mathrm{SSK}\right)&= \left[ 1-\mathfrak {q}^\mathrm{OS}\left( 1-2\bar{\omega }\left( \eta ^\mathrm{OS}\right) \right) \right] \nonumber \\&\qquad \times \left[ 1-\mathfrak {q}^\mathrm{SSK}\left( 1-2\tilde{\omega }\left( \eta ^\mathrm{SSK}\right) \right) \right] . \end{aligned}$$
(14)

account for the measured flavour of the \({B} ^0_{s} \) candidate. All physics parameters are free in the fit and are common across the subsamples, except for the S-wave fraction and the phase difference \(\delta _\mathrm{S}-\delta _{\perp }\), which are independent parameters for each \(m({{K} ^+} {{K} ^-})\) bin.

9 Systematic uncertainties

Systematic uncertainties on the measured physics parameters arise from a variety of sources that are described in the following. They are summarised in Table 4.

Table 4 Summary of the systematic uncertainties

Three systematic effects due to the \(m(J/\psi K^{+}K^{-})\) model and the sWeights computation are taken into account. Firstly, the systematic effect due to statistical uncertainties in the \(m(J/\psi K^{+}K^{-})\) fit model is estimated. For this the sWeights are recomputed after varying the fit parameters within their statistical uncertainties. The systematic uncertainties are obtained from the difference in fit results and are found to be negligible. Secondly, the average width of the double-sided CB distribution is parametrised as a linear function of the per-candidate mass uncertainty, instead of a quadratic one. The differences to the baseline result are assigned as systematic uncertainties on the mass shape. Thirdly, the assumption that the \(m(J/\psi K^{+}K^{-})\) distribution is independent of the decay time and angles is tested by re-evaluating the sWeights in bins of these observables, repeating the fit and assigning the differences in fit results as systematic uncertainties.

The main physics background contribution comes from misidentified \({{\varLambda } ^0_{b}} \rightarrow J/\psi p {{K} ^-} \) decays. Possible effects due to the limited knowledge of the size of this component are estimated by repeating the fit after varying the amount of this background by one standard deviation of its measured yield. The maximum difference is found to be negligible and thus no systematic uncertainties are assigned. A further systematic effect due to the \({{B} ^0} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) background is evaluated by repeating the \(m(J/\psi K^{+}K^{-})\) fit while leaving the mass resolution for this component free. A new set of sWeights are computed, leading to negligible systematic uncertainties. Finally, approximately 0.5% of \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) candidates come from the decays of \({{B} _{c} ^+} \) mesons via the \({{B} _{c} ^+} \rightarrow {{B} ^0_{s}} {{\pi } ^+} \) decay [63, 64]. The effect of ignoring this component in the fit is evaluated using simulated pseudoexperiments where \(0.5\%\) of the candidates are replaced with \({B} ^0_{s} \)-from-\({{B} _{c} ^+} \) decays that are randomly sampled from simulated \({{B} _{c} ^+} \rightarrow {{B} ^0_{s}} (\rightarrow J/\psi \phi )\pi ^+\) decays. This is found to have a negligible effect on all parameters.

In the baseline strategy, all candidates are retained even if multiple candidates are present in a single event. A part of these multiple candidates is found to peak in the \(m(J/\psi K^{+}K^{-})\) distribution, which introduces a bias in the physics parameters, mainly on \(\Gamma _s\). The peaking component is due to so-called clone candidates originating from final state tracks that are duplicated in the reconstruction process. Candidates are considered to be clones if they belong to the same event and their final-state tracks are separated by an angle smaller than 5 mrad. To assign systematic uncertainties, a single random candidate is selected among all clone candidates in an event and the fit is repeated. Approximately 0.35% (0.2%) of all selected \({{B} ^0_{s}} \) (\({{B} ^0} \)) candidates are removed. The maximum resulting variations of the fit values are assigned as systematic uncertainties.

Possible biases of the fitting procedure are studied by generating and fitting over eight thousand pseudoexperiments of the same size as the data. The biases are determined from the resulting pull distributions. The ones that are significantly different from zero are assigned as systematic uncertainties.

Different models of the S-wave lineshape based on the results in Ref. [65] are used to evaluate the coupling factors \(C_\mathrm{SP}\) in each of the six \(m({{K} ^+} {{K} ^-})\) bins, according to Ref. [31]. This includes an S-wave parametrisation with a cubic spline function determined from data, a variation of the \(f_{0}(980)\) pole and width parameters used in the baseline model within their uncertainties, and the variation of the \(f_{0}(980)\) parameters according to the second solution found in the analysis in Ref. [65]. The maximum resulting variations of the fit values, mostly due to the spline parametrisation, are assigned as systematic uncertainties.

The tagging parameters are Gaussian-constrained in the fit and therefore their uncertainties contribute to the statistical uncertainty of each fit value. This mainly affects the parameter \({\phi _{{s}}} \), with a contribution to the uncertainty of 15\(\mathrm { \,mrad}\). In addition, the calibration of the OS tagging is re-evaluated using a quadratic function instead of a linear one. The observed differences when repeating the fit are found to be negligible.

The systematic uncertainties associated with decay-time resolution originate from four different sources. The first is due to the statistical uncertainties on the calibration parameters and is found to be negligible. The second is related to the assumption that the resolution model obtained in the calibration sample applies also to the signal sample. The corresponding systematic uncertainty is determined by evaluating the ratio of the calibration effective resolutions, obtained from the simulated samples of the calibration and signal decays, and using it to scale the effective resolutions in the prompt data sample. These scaled effective resolutions are then described by a quadratic function and used to determine the physics parameters. The differences with respect to the baseline result are assigned as systematic uncertainties. A third source of uncertainty is due to a possible bias of the Gaussian resolution mean, which is assumed to be zero in the baseline model. A quadratic dependence of the mean on the decay-time uncertainty is observed in the calibration sample, with a maximum deviation of about \(5\mathrm { \,fs} \) from zero. It is modelled in the prompt data sample after weighting it in order to match the signal data sample. Corresponding systematic uncertainties are evaluated as the differences between the results obtained with this bias and the baseline model. Finally, the fourth systematic effect is estimated by varying the contribution in the fit of candidates with an associated wrong origin vertex. The fraction of these candidates is varied between 0 and 1.5%, corresponding to about three times the fraction that is measured in the calibration sample, the calibration updated and the fit to data repeated. The maximum deviations from the baseline fit are assigned as systematic uncertainties.

The angular efficiency is determined from simulated signal, weighted such that the kinematic distributions of the final-state particles match those in the data. Systematic uncertainties are assigned to account for the limited size of the simulated sample by varying the normalisation weights according to their uncertainties and their covariance matrix and repeating the fit with a new varied set of weights. The resulting RMS of each fitted observable is taken as a systematic uncertainty. In addition, the impact of the specific configuration of the gradient-boost tree method used in the reweighting of the simulation is studied by testing approximately one hundred alternative configurations. The maximal deviations from the fit result obtained with the default angular efficiency are assigned as systematic uncertainties. The differences between the fit results obtained using angular corrections from the baseline or alternative weighting procedures of the simulated candidates are also considered as systematic uncertainties. An imperfect removal of clone candidates, in simulation, that peak in the \({B} ^0_{s} \) candidate mass is tested as follows. The peaking component is separated from the underlying background via sWeights using all simulated events to determine its shape. As an alternative, it is modelled according to the distribution of the corresponding background classification that is available in simulations, which however is limited by the small sample size and is therefore not used as the baseline strategy. In addition, the two components are separated by matching the reconstructed daughter particles to the simulated particles by comparing their track momentum magnitudes and directions. The angular efficiency is determined according to these two changes and the larger differences are assigned as systematic uncertainties. Finally, from a fit to several simulated samples of the same size as data, uncertainties are evaluated as the differences between fitted and generated values to account for correlations between the angular efficiency and the decay time as well as the decay-time uncertainty. Such correlations are neglected in the baseline fit.

Several sources of systematic uncertainties related to the determination of the spline-based decay-time efficiency are studied and found to be small. First, the effect due to the limited size of the data and simulated samples is estimated by repeating the fit several times with the spline coefficients varied according to their covariance matrix and the RMS of the fitted observable distributions is taken as systematic uncertainties. Two further contributions are evaluated by taking the difference between the baseline fit and alternative fits where the time efficiency is determined without applying either the kinematic or the PDF weighting procedures used to correct the physics parameters of \({{B} ^0_{s}} \) and \({{B} ^0} \) simulated samples. Next, the number of spline nodes is doubled and found to have a negligible effect on the result. Another systematic uncertainty source is due to the differences observed in decay-time efficiency derived from the simulated samples with \(\Delta \Gamma _s\) equal to or different from zero. It has been also checked that varying the decay-time resolutions used in the determination of the decay-time efficiency by \(10\%\) has negligible effects.

The uncertainty on the LHCb length scale is estimated to be 0.022% [66], as determined from metrology and track-based alignment. This translates directly into an uncertainty on \({\Gamma _{{s}}}-{\Gamma _{{d}}} \), \(\Delta \Gamma _s\) and \(\Delta m_s\), which is non-negligible only in the case of \(\Delta m_s\). Other parameters are unaffected. The precision on the track momentum scale is \(0.03\%\). Its effect largely cancels in the computation of the decay time, leading to negligible uncertainties on all observables.

Asymmetries between \({B} ^0_{s} \) and \({\overline{B}{}} {}^0_{s} \) production rates are diluted by the fast oscillation between particle and antiparticle. They are found to have a negligible effect on the fit parameters.

No statistically significant systematic effect on the results is observed when repeating the analysis on subsets of the data, splitting by magnet polarity, trigger conditions, year of data taking, number of primary vertices, bins of \({B} ^0_{s} \) \(p_{\mathrm {T}}\), pseudorapidity and decay-time uncertainty.

10 Results

The results of the maximum-likelihood fit described in Sect. 8 are

$$\begin{aligned} \phi _{s}&= -0.083\pm 0.041\pm 0.006\mathrm { \,rad} \nonumber \\ |\lambda |&= 1.012\pm 0.016\pm 0.006 \nonumber \\ \Gamma _{s}-\Gamma _{d}&= -0.0041\pm 0.0024\pm 0.0015{\mathrm { \,ps^{-1}}} \nonumber \\ \Delta \Gamma _{s}&= 0.077\pm 0.008\pm 0.003{\mathrm { \,ps^{-1}}} \nonumber \\ \Delta m_{s}&= 17.703\pm 0.059\pm 0.018 {\mathrm { \,ps^{-1}}} \nonumber \\ |A_{\perp }|^2&= 0.2456\pm 0.0040\pm 0.0019\nonumber \\ |A_0|^2&= 0.5186\pm 0.0029\pm 0.0024 \nonumber \\ \delta _\perp -\delta _0&= 2.64\pm 0.13\pm 0.10 \mathrm { \,rad} \nonumber \\ \delta _\parallel -\delta _0&= 3.06^{\,+\,0.08}_{\,-\,0.07}\pm 0.04 \mathrm { \,rad}. \end{aligned}$$
(15)

The S-wave fractions and phase differences with respect to \(\delta _{\perp }\) in each \(m(K^{+}K^{-})\) bin are summarized in Appendix 13. The background-subtracted data distributions with fit projections are shown in Fig. 11.

The results are in good agreement with the previous LHCb measurement. The measurements of \(\phi _s\), \(\Delta \Gamma _s\) and \({\Gamma _{{s}}}-{\Gamma _{{d}}} \) are the most precise to date and agree with the SM expectations [4, 5, 18, 19]. The results also indicate no \(CP\) violation in \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) decays. The value of \(\Delta m_{s}\) is in a good agreement with the world average value [46]. Relaxing the assumption that \(\lambda _r\) is the same for all polarisation states and repeating the fit shows no evidence for any polarisation dependence. The correlation matrix including systematic uncertainties can be found in Table 5.

Table 5 Correlation matrix including the statistical and systematic correlations between the parameters
Fig. 11
figure 11

Decay-time and helicity-angle distributions for background subtracted \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) decays (data points) with the one-dimensional projections of the PDF at the maximum-likelihood point. The solid blue line shows the total signal contribution, which contains (long-dashed red) \(CP\)-even, (short-dashed green) \(CP\)-odd and (dotted-dashed purple) S-wave contributions. Data and fit projections for the different samples considered (data-taking year, trigger and tagging categories, \(m(K^{+}K^{-})\) bins) are combined

11 Combination with other results

The results presented in this paper are combined with related Run 1 and Run 2 LHCb measurements, taking into account all statistical correlations, all systematic errors and their correlations, and correlations between different run periods.

11.1 Combination with Run 1 \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \)

The measurements presented in this paper are consistent with those obtained from the analysis of the data collected by LHCb during the LHC Run 1 [27]. The Run 1 measurements are combined with the results of this analysis taking into account a covariance matrix that includes the statistical uncertainties with their correlations, and the systematic uncertainties with their correlations, both between the parameters in a single run period and between the two run periods.

The sources of systematic uncertainty that are correlated between the analyses are the applicability of the time resolution obtained from the prompt control sample on the signal sample, the \(C_\mathrm{SP}\) factors, the correction of simulation for the angular efficiency determination, and the length scale. In the case of the angular efficiency, a correlation matrix is determined from the RMS distributions of the parameters in Run 2 and the same matrix is taken to account for correlations between Run 1 and Run 2. For all other sources of systematic uncertainty no correlation is assumed. For the parameters showing asymmetric uncertainties, the larger uncertainty has been used in the combination.

It has been verified that using the average of the two asymmetric uncertainties does not change the combination, and that completely ignoring the systematic correlations has a negligible effect. In the Run 1 measurement, \(\Gamma _{s}\) was measured instead of \(\Gamma _{s}-\Gamma _{d}\), hence a linear transformation is taken into account in the combination, constraining \(\Gamma _d\) to the known value [32]. The combined results are

$$\begin{aligned} \phi _{s}&= -0.080\pm 0.032\mathrm { \,rad} \,, \nonumber \\ |\lambda |&= 0.993\pm 0.013\,, \nonumber \\ \Gamma _{s}&= 0.6570\pm 0.0023{\mathrm { \,ps^{-1}}} \,,\nonumber \\ \Delta \Gamma _{s}&= 0.0784\pm 0.0062{\mathrm { \,ps^{-1}}} \,,\nonumber \\ \Delta m_{s}&= 17.691 \pm 0.042{\mathrm { \,ps^{-1}}} \,,\nonumber \\ |A_{\perp }|^2&= 0.2486 \pm 0.0035\,, \nonumber \\ |A_0|^2&= 0.5197 \pm 0.0035\,, \nonumber \\ \delta _\perp -\delta _{0}&= 2.88 \pm 0.11\mathrm { \,rad} \,,\nonumber \\ \delta _\parallel -\delta _{0}&= 3.155 \pm 0.079\mathrm { \,rad}. \end{aligned}$$
(16)

The correlation matrix can be found in Table 6. The correlation between \(\Gamma _{s}\) and \(\Gamma _d\) is 0.39. The combined value of \(\phi _s\) is 2.5 standard deviations from zero and agrees with expectations based on the SM [4, 5].

Table 6 Correlation matrix for the results in Eq. (16) taking into account correlated systematics between Run 1 and the 2015 and 2016 results

11.2 Combination with other LHCb \(\phi _s\) results

The results obtained in the previous section are further combined with the recent results from \({{B} ^0_{s}} \rightarrow J/\psi {{\pi } ^+} {{\pi } ^-} \) [67] decays, and the Run 1 results from \({{B} ^0_{s}} \rightarrow J/\psi {{\pi } ^+} {{\pi } ^-} \) [28], \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) for the \({{K} ^+} {{K} ^-} \) invariant mass region above 1.05\(\mathrm {\,Ge}{\mathrm {V\!/}c^2}\)  [31], \({{B} ^0_{s}} \rightarrow \psi (2S)\phi \) [29] and \({{B} ^0_{s}} \rightarrow D^+_sD^-_s\) [30] decays.

The Run 1 analysis of \({{B} ^0_{s}} \rightarrow J/\psi {{\pi } ^+} {{\pi } ^-} \) decays measured \(|\lambda |\) and \(\phi _s\) assuming a value of \({\Delta m_{{s}}} \) fixed to \(17.768\pm 0.024{\mathrm { \,ps^{-1}}} \). Before the combination, this value is updated to \(17.711\pm 0.059{\mathrm { \,ps^{-1}}} \) [27], and the analysis is repeated to obtain updated values of \(|\lambda |\) and \(\phi _s\). The analysis of 2015 and 2016 data, instead, measured \(\phi _s\), \(|\lambda |\) and \(\Gamma _\mathrm{H} - \Gamma _d\), assuming the \(\Delta m_s\) value determined in this analysis. In the combination, \(\Gamma _\mathrm{H} - \Gamma _d\) is parametrised as \(\Gamma _{s}-\Gamma _{d}-\Delta \Gamma _s/2\), and the value of \(\Gamma _d\) is constrained to the known value. The combined values are

$$\begin{aligned} \phi _{s}&= -0.041 \pm 0.025\mathrm { \,rad} \,, \nonumber \\ |\lambda |&= 0.993 \pm 0.010\,, \nonumber \\ \Gamma _{s}&= 0.6562 \pm 0.0021{\mathrm { \,ps^{-1}}} \,,\nonumber \\ \Delta \Gamma _{s}&= 0.0816 \pm 0.0048{\mathrm { \,ps^{-1}}} \,. \end{aligned}$$
(17)

The correlation matrix can be found in Table 7. The correlation between \(\Gamma _{s}\) and \(\Gamma _{d}\) is 0.48.

Table 7 Correlation matrix for the results in Eq. (17) obtained taking into account correlated systematics between the considered analyses

The values of these parameters are the most precise to date. Figure 12 shows the 68% confidence level regions in the \(\phi _s\) vs. \(\Delta \Gamma _s\) plane for the considered analyses and the LHCb combination. The combined value of \(\phi _s\) is consistent with global fits to data. The parameter \(|\lambda |\) agrees with the hypothesis of no \(CP\) violation in the decay. The values of \(\Gamma _s\) and \(\Delta \Gamma _s\) are consistent with expectations from HQE models.

Fig. 12
figure 12

Regions of 68% confidence level in the \(\phi _s\)-\(\Delta \Gamma _s\) plane for the individual LHCb measurements and a combined contour (in blue). The \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) (magenta) and \({{B} ^0_{s}} \rightarrow J/\psi {{\pi } ^+} {{\pi } ^-} \) [67] (red) contours show the Run 1 and Run 2 combined numbers. The \(\phi _s\) [4] and \(\Delta \Gamma _{s}\) [18] predictions are indicated by the thin black rectangle

12 Conclusions

In summary, a flavour-tagged decay-time-dependent angular analysis of \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) decays has been performed, using 1.9\(\,\mathrm{fb}^{-1}\) of pp collision data recorded by the LHCb experiment during the 2015 and 2016 runs of the LHC. Approximately \(117\,000\) signal decays are selected, with a decay-time resolution of about 45\(\mathrm { \,fs}\) and a tagging power of 4.7%. The \(CP\)-violating phase \({\phi _{{s}}} \) is measured to be \(-0.083\pm 0.041\pm 0.006\) \(\mathrm { \,rad}\), the decay width difference of the \({{B} ^0_{s}} \) mass eigenstates, \(\Delta \Gamma _s = 0.077\pm 0.008\pm 0.003\) \({\mathrm { \,ps^{-1}}}\), and the difference of the average decay widths of the \({B} ^0_{s} \) and \({B} ^0\) mesons, \(\Gamma _s-\Gamma _d = -0.0041\pm 0.0024\pm 0.0015\) \({\mathrm { \,ps^{-1}}}\). Using the known value for the \({B} ^0\) meson lifetime \(1.520\pm 0.004{\mathrm { \,ps}} \) [32], the ratio of \({B} ^0_{s} \) and \({B} ^0\) meson decay widths is measured to be \(\Gamma _s/\Gamma _d = 0.9938 \pm 0.0036 \pm 0.0023\). All results are shown with first the statistical and second the systematic uncertainty. These are the single most precise measurements of these quantities to date. In addition, the mass difference between the \({B} ^0_{s} \) mass eigenstates is measured to be \(\Delta m_{s}=17.703\pm 0.059\pm 0.018{\mathrm { \,ps^{-1}}} \). All results are consistent with theoretical predictions based on the SM [4, 5]. The \(CP\)-violating parameters are also determined assuming that they are not the same for all \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) polarisation states and no polarisation dependence is observed.

The measurements presented here for the parameters \({\phi _{{s}}} \), \(|\lambda |\), \({\Gamma _{{s}}}-\Gamma _{d}\), \({\Delta \Gamma _{{s}}} \), \({\Delta m_{{s}}} \), \(|A_{\perp }|^{2}\), \(|A_{0}|^{2}\), \(\delta _\perp -\delta _{0}\) and \(\delta _\parallel -\delta _{0}\) are consistent with those from \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) decays obtained using data collected by the LHCb experiment during Run 1 of the LHC [27]. The two sets of measurements are combined accounting for the statistical and systematic correlations between parameters in each and the systematic correlations between the two run periods. The combined values are \(\phi _s = -0.080 \pm 0.032\) \(\mathrm { \,rad}\), \(|\lambda | = 0.993 \pm 0.013\), \(\Gamma _s = 0.6570 \pm 0.0023{\mathrm { \,ps^{-1}}} \), \(\Delta \Gamma _s = 0.0784\pm 0.0062{\mathrm { \,ps^{-1}}} \) and \(\Delta m_{s}= 17.691\pm 0.042{\mathrm { \,ps^{-1}}} \). The value of \(\phi _s\) is 2.5 standard deviations from zero and consistent with theoretical predictions based on the SM [4, 5].

The results are further combined with the recent results from \({{B} ^0_{s}} \rightarrow J/\psi {{\pi } ^+} {{\pi } ^-} \) [67], and the Run 1 results from \({{B} ^0_{s}} \rightarrow J/\psi {{\pi } ^+} {{\pi } ^-} \) [28], \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) for the \({{K} ^+} {{K} ^-} \) invariant mass region above 1.05\(\mathrm {\,Ge}{\mathrm {V\!/}c^2}\)  [31], \({{B} ^0_{s}} \rightarrow \psi (2S)\phi \) [29] and \({{B} ^0_{s}} \rightarrow D^+_sD^-_s\) [30]. The combined values are \(\phi _s = -0.041 \pm 0.025\) \(\mathrm { \,rad}\), \(|\lambda | = 0.993 \pm 0.010\), \({\Gamma _{{s}}} = 0.6562 \pm 0.0021\) \({\mathrm { \,ps^{-1}}}\) and \({\Delta \Gamma _{{s}}} = 0.0816 \pm 0.0048\) \({\mathrm { \,ps^{-1}}}\). These values are consistent with theoretical predictions based on the SM [4, 5]. In particular, the value of \({\phi _{{s}}} \) is consistent with a non-zero \(CP\)-violation predicted within the SM and with no \(CP\)-violation in the interference of \({B} ^0_{s} \) meson mixing and decay. The parameter \(|\lambda |\) is consistent with unity, implying no evidence for direct \(CP\)-violation in \({{B} ^0_{s}} \rightarrow J/\psi {{K} ^+} {{K} ^-} \) decays.