Observation of $B^0_s$-$\bar{B}^0_s$ mixing and measurement of mixing frequencies using semileptonic B decays

The $B^0_s$ and $B^0$ mixing frequencies, $\Delta m_s$ and $\Delta m_d$, are measured using a data sample corresponding to an integrated luminosity of 1.0 fb^{-1} collected by the LHCb experiment in $pp$ collisions at a centre of mass energy of 7 TeV during 2011. Around 1.8x10^6 candidate events are selected of the type $B^0_{(s)} \to D^-_{(s)} \mu^+$ (+ anything), where about half are from peaking and combinatorial backgrounds. To determine the B decay times, a correction is required for the momentum carried by missing particles, which is performed using a simulation-based statistical method. Associated production of muons or mesons allows us to tag the initial-state flavour and so to resolve oscillations due to mixing. We obtain \Delta m_s = (17.93 \pm 0.22 (stat) \pm 0.15 (syst)) ps^{-1}, \Delta m_d = (0.503 \pm 0.011 (stat) \pm 0.013 (syst)) ps^{-1}. The hypothesis of no oscillations is rejected by the equivalent of 5.8 standard deviations for $B^0_s$ and 13.0 standard deviations for $B^0$. This is the first observation of $B^0_s$ mixing to be made using only semileptonic decays.


Introduction
B 0 s and B 0 mesons propagate as superpositions of particle and antiparticle flavour states. For a flavour-specific decay process 1 such as B 0 → D − µ + ν, particle-antiparticle mixing lends a sinusoidal component to the decay rates [1,2]. To measure mixing, the flavour state of the B meson must be observed to change, which requires knowledge of the state from at least two points in time. The experimentally accessible times to determine the flavour are at production and decay. Neglecting CP violation in mixing, the decay rate N at a proper decay time t simplifies to where ∆Γ and ∆m are the width and mass differences 2 of the two mass eigenstates, and Γ is the average decay width [2]. The positive sign applies when the B meson decays with the same flavour as its production and the negative sign when the particle decays with opposite flavour to its production, later referred to as "even" and "odd". In this study, a sample of semileptonic decays obtained with the LHCb detector is used to measure the mixing frequencies ∆m s and ∆m d for the B 0 s and B 0 systems. These quantities have previously been measured to high precision, usually in the combination of several channels, relying heavily on hadronic decay modes (see for example Refs. [3,4] and our recent results, Refs. [5][6][7]). To date no observation of B 0 s mixing has been made using only semileptonic decay channels.

Experimental setup
The LHCb detector [8] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector consists of several dedicated subsystems, organized successively further from the interaction region. A silicon-strip vertex detector surrounds the pp interaction region and approaches to within 8 mm of the proton beams. The first of two ring-imaging Cherenkov (RICH) detectors comes next, followed by the remainder of the tracking system, which comprises, in order: a large-area silicon-strip detector; a dipole magnet with a bending power of about 4 Tm; and three multilayer tracking stations, each with central silicon-strip detectors and peripheral straw drift tubes. After this comes the second RICH detector, the calorimeter and the muon stations.
The combined high-precision tracking system provides a momentum measurement with relative uncertainty that varies from 0.4 % at 5 GeVc −1 to 0.6 % at 100 GeVc −1 , and impact parameter 3 resolution of 20 µm for tracks with high transverse momentum. By combining information from the two RICH detectors [9] charged hadrons can be identified across a 1 In this paper, charge conjugate modes are always implied. 2 The mass difference is measured here as an angular frequency, in units of inverse time. 3 The impact parameter is the distance of closest approach of a track to a primary interaction vertex. 1 wide range in momentum, around 2 to 150 GeVc −1 . The calorimeter system consists of scintillating-pad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter, allowing identification of photon, electron and hadron candidates. Muons that pass through the calorimeters are detected using a system of alternating layers of iron and multiwire proportional chambers [10]. Triggering of events is performed in two stages [11]: a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which performs full event reconstruction.

Data selection and reconstruction
The LHCb dataset used in this analysis corresponds to an integrated luminosity of 1.0 fb −1 collected in pp collisions at a centre of mass energy of 7 TeV during the 2011 physics run at the LHC. Where simulation is required, Pythia 6.4 [12] is used, with a specific LHCb configuration [13]. Decays of hadronic particles are described by EvtGen [14], in which final-state radiation is generated using Photos [15]. The interaction of the generated particles with the detector and the detector response are implemented using the Geant4 toolkit [16] as described in Ref. [17]. Input to EvtGen is taken from the best knowledge of branching fractions (B) and form factors at the time of the simulation [1]. The same reconstruction and selection is applied on simulated and detector data.
A sample of events is selected in which a D + (s) → K + K − π + candidate forms a vertex with a muon candidate. A cut-based selection is applied to enhance the fraction of real D + (s) mesons in this sample that arise from B 0 (s) semileptonic decays. Vertex and track reconstruction qualities, momenta, invariant masses, flight distances and particle identification (PID) variables are used. The selection was initially optimized on simulated data to maximize the signal significance, S/ (S + B), where S (B) denotes the number of selected signal (background) candidates. The most important cuts for this analysis are those on the PID and invariant masses. Combined information from the RICH detectors, muon stations, calorimeters and tracking allows us to place stringent requirements on a log-likelihood based PID parameter for each final-state particle separately, ensuring at least 99 % purity in the muon sample, and suppressing peaking backgrounds such as D + → K − π + π + decays, where a pion has been misidentified as a kaon. To allow a simultaneous measurement of ∆m s and ∆m d , a broad mass window for the K + K − π + system is used to cover both the D + and D + s masses, −0.
. Decays of the type D * (2010) + → D 0 π + are additionally suppressed by requiring that the invariant mass of the two kaons M (K + K − ) < 1.84 GeVc −2 , and combinatorial background with slow collinear pions is similarly removed with the mass requirement M ( Simulation studies indicate that the selected sample is dominated by B 0 s → D − s µ + (ν, π 0 , γ), B 0 → D − µ + (ν, π 0 , γ) and B + → D − µ + (ν, π + , γ) decays, where no specific intermediate states are required other than those mentioned, and where at least one neutrino will occur together with any number of the other particles in the parentheses. These additional particles are ignored and so a clear B mass peak cannot be reconstructed.
For simplicity, to quantify the measured mass, M (Dµ), within its possible range, we define a "normalized mass", n, relative to the known masses (M 0 ) of the B, D, and µ: We require 0.24 < n < 1.0, where the lower cut mainly removes low-mass combinatorial background candidates. The K + K − π + invariant mass distribution and the normalized mass distribution (n) of the selected candidates are shown in Fig. 1, in which the D + s and D + peaks can clearly be seen over the combinatorial background.
Determination of the initial-state flavour is performed using the standard LHCb flavourtagging algorithms, which are described in detail elsewhere [5,6,18]. These algorithms rely on the reconstruction of particles that were produced in association with, and are flavourcorrelated with, the signal B-meson. The correlations arise either from fragmentation, which often produces a kaon or pion of specific charge correlated with the signal, or from "opposite-side" decays, where the decay products of the partner b quark are reconstructed (e.g. a muon). A neural network combines tagging decisions for the best tagging power [6].
A hypothesis is required for the nature of the reconstructed candidate, either B 0 s or B 0 , in order to choose the tagging algorithms to be applied and to select the appropriate mass with which to calculate n. A split around the midpoint between the D + s and D + peaks is used. only opposite-side tags are used, to reduce the difference between B + and B 0 tagging performance and thus better constrain the B + background (see Secs. 5 and 6). The flavour-tagged dataset comprises 594,845 selected candidates.
Two techniques are employed to measure the mixing frequencies: (a) multidimensional log-likelihood maximization, simultaneously fitting ∆m s and ∆m d ; (b) model-independent Fourier analysis, used as a cross-check, which determines ∆m s with good precision, but ∆m d with a very poor precision. Both methods use a common determination of the proper decay time and so share a portion of the corresponding systematic effects.

Proper decay-time distributions
To obtain the B-meson decay times, a correction is applied for the momentum lost due to missing particles, using a k-factor method as employed in many previous measurements (see, for example, Refs. [19] and [20]). The k-factor [21] is a simulation-based statistical correction, where the average missing momentum in a simulated sample is used to correct the reconstructed momentum as a function of the reconstructed Dµ mass (as shown in Fig. 2). In this study we use a fourth-order polynomial to parameterize k as a function of the normalized Dµ mass (n from Eq. 2), which allows us to use the same correction for B 0 s and B 0 . With this approach, both ∆m s and ∆m d exhibit residual biases of around 1 %; these biases are known to good precision from the full simulation and are corrected in the final results.
The experimental resolution of the proper decay time (t) reduces the visibility of the mass, n (no units) µ s sample. For each event the ratio of reconstructed to generated momentum, p rec /p sim is plotted against the normalized Dµ mass (n in Eq. 2). The curve shows a fourth-order polynomial resulting from a fit to the mean of the distribution (in bins of n).
oscillations, smearing Eq. 1 with a resolution function R(t, t − t), where t is the true decay time and t is the measured value. The limited performance of the tagging also reduces the visibility of the oscillations. Our selection requirements include variables that are correlated with the decay time, leading to a time-dependent efficiency function, ε(t ). Thus Eq. 1 becomes where η is the tagging efficiency and ω is the mistag probability (the fraction of tags that assign the wrong flavour). We parameterize the time-dependent efficiency with an empirical "acceptance" function. Specifically Gaussian functions are used as motivated by data and full simulation studies [21], where G is the Gaussian function and the parameters are determined from fits to the data (typical values are σ 1,2 < 1 ps and µ 0 ≈ 0.01 ps). The k-factor is a relative correction for the average missing momentum at a given value of n; as shown in Fig. 2, the range of missing momenta is broad and varies from about 70 % at n = 0.2 to zero at n = 1. This large relative uncertainty on the corrected momentum (p ) dominates the decay time resolution, i.e.
per20.1ps2 per20.1ps2 B2decay2time2(t2and2t') approximately proportional to t (as seen in Fig. 3) and the decay time resolution worsens as decay time increases. This dependence is determined and parameterized from the full simulation. We may choose between a parameterization in terms of either the generated ("true") decay time, using a numerical convolution, or in terms of the measured decay time, using analytical methods; the latter is the default approach. The resolution dependence is well-fitted with second or third order polynomials.
5 Multivariate fits to the data A binned, multidimensional, log-likelihood fit to the data is made, using the Root and embedded RooFit fitting frameworks [22,23]. In order to improve the resolution on the fitted value of ∆m s , the sample is divided into two subsamples about normalized mass n = 0.56 (with this value determined using fast-simulation "pseudo-experiment" studies), and the two subsamples are fitted simultaneously as described below. There are 101,000 bins over the K + K − π + mass, the measured decay time (t ), the normalized mass (n < 0.56 and n > 0.56), and the tagging result (even and odd). Seven categories of signal and background are assigned component probability density functions (PDFs) whose fractions and shape parameters are left free in the fits to the data. The backgrounds are categorized in terms of their shapes in the mass and decay-time observables. Using the M (K + K − π + ) distribution we separate out peaking D + (s) components from combinatorial background components. Each of these categories can be further divided into two based on their decay-time shape. We use the term "prompt" to describe fake candidates containing particles exclusively produced in the primary pp interaction, and the term "detached" for candidates that contain at least one daughter of a secondary decay and which therefore tend to exhibit a significantly larger lifetime. Candidates for the signal B-decays of interest must be both detached and peaking. The signal-like decays are usually grouped together in the fit; however, we separate the specific background contribution of B + within the D + peak and fit that directly. These components are shown in together in Fig. 4 and separately in different M (K + K − π + ) regions in Figs. 5 and 6. Each mass PDF is a Gaussian function or a Chebychev polynomial (Fig. 4), and each background decay-time PDF is a simple exponential with an appropriate acceptance function as previously described (Fig. 6). For the signal decay-time shape we use the model described in Eq. 3, with one instance for each peak. The majority of our sensitivity arises from the mixing asymmetry, whose time-dependent fit in the signal regions is shown in Fig. 7. Any odd/even asymmetry is assumed to be constant as a function of time for prompt backgrounds and for backgrounds that are known not to mix (B + , Λ b , etc.). Generic detached backgrounds are allowed to have a time-dependent asymmetry varying as an arbitrary quadratic polynomial.
The proportion of B + → D − µ + (ν, π + , γ) with respect to B 0 → D − µ + (ν, π 0 , γ) is fixed to 11 % with a ±2 % uncertainty, using the ratio of known fragmentation functions and branching fractions [1]. Based on the full LHCb simulation, this ratio is corrected by 25 % to account for differences in the reconstruction and tagging efficiencies, with the full value of this correction taken as a systematic uncertainty. We fix ∆Γ s using the result of a recent LHCb analysis [24], and ∆Γ d is fixed to zero.
Only the signal mass shapes and the parameters of interest, ∆m s and ∆m d , are shared  between the two subsamples in n, which are fitted simultaneously. The goodness of the fit is verified with a local density method [25], which finds a p-value of 19.6 %.
6 Fit results and systematic uncertainties Table 1 gives the fitted values for some important quantities. In principle the signal lifetimes are also measured, but these have very large systematic uncertainties and so no results are quoted. The systematic uncertainties on ∆m s and ∆m d are first discussed before the final results are given.
Several sources of systematic uncertainty on the main measured quantities, ∆m s and ∆m d , are considered, as summarized in Table 2. The majority of the systematic uncertainties are obtained from the data.
• The k-factor: the k-factor correction is a simulation-based method, and so differences between the simulation and reality that modify the visible and invisible momenta potentially invalidate the correction. Such differences could for example be in D * * branching fractions or form factors. Large-scale pseudo-experiment studies are combined with full simulations to vary these underlying distributions within their uncertainties and examine biases produced on the fitted ∆m values. Small relative uncertainties are found, 0.3 % for ∆m s and 1.0 % for ∆m d , representing the ultimate limit of this technique without further knowledge of the various sub-decays. The left plot shows the asymmetry for events for a region of ±20 MeVc −2 around the D + s mass peak, and the right plot shows the corresponding asymmetry around the D + mass peak. The black points show the data and the curves are projections of the fitted PDF. On the left plot the fast oscillations of B 0 s are gradually washed out by the increasingly poor decay-time resolution.
9 certainties have been studied using detector survey data and various control modes; they are well determined and small in comparison to the statistical uncertainties [26].
• Values of ∆Γ: The quantities ∆Γ d and ∆Γ s are nominally constant in our fits. When they are varied, within ±5 % for ∆Γ d (chosen to well-cover the experimental range given the lack of information on its sign [1]) and within the known uncertainty on ∆Γ s [24], our result is only marginally affected.
• Model bias: a correction has been made for the 1 % residual frequency bias seen in full simulation studies, as discussed in Sec. 4. This is taken directly from simulation and half of the correction is assigned a systematic uncertainty.
• Signal proper-time model: the fit is repeated with two different time-resolution models. (a) When the resolution is parameterized as a function of true rather than measured decay time, using full numerical convolution, a (0.09, 0.002) ps −1 variation is seen in (∆m s , ∆m d ). (b) When a time-independent (average) resolution is used, a 0.001 ps −1 variation is seen in ∆m d (this method is not applicable to the measurement of ∆m s due to many factors; crucially, within the time frame of any single B 0 s oscillation the decay time resolution worsens by an appreciable fraction of the oscillation period, seen in Figs. 3 and 7). With other modifications to the signal model (resolutions and acceptances) a larger variation in ∆m d of 0.007 ps −1 is found.
• Other models and binning: the order of the Chebychev polynomial is varied, Crystal Ball functions are used for the mass peak shapes, and the background parameterizations and the binning schemes are varied. Out of these modifications, the binning Table 1: A selection of fitted parameter values, for which statistical uncertainties only are given.
The B 0 s signal fraction includes contributions from any detached D + s production. When the omitted fractions (of combinatorial background components) are included, the total fraction sums to unity within each n region separately.

Quantity
Normalized mass region Low-n High-n scheme has the largest effect. Resulting uncertanties of 0.05 ps −1 and 0.001 ps −1 are assigned to ∆m s and ∆m d , respectively.
• Assumptions on B + decays: The ∆m d measurement is sensitive to χ d , the integrated mixing probability, which in turn is sensitive to the non-mixing B + -background. We hold constant several B + -background parameters in the baseline fit, determined from the full simulation. Many features of the B + background fit are varied to evaluate systematic variations, including the fraction, the lifetime, and the corrections for relative tagging performance. The largest uncertainty arises from tagging performance corrections and for this a 0.008 ps −1 uncertainty is assigned to ∆m d . It is possible to leave one or more of these parameters free during the fit, but the loss in statistical precision is prohibitive.
To obtain a measure for the significance of the observed oscillations, the global likelihood minimum for the full fit is compared with the likelihood of the hypotheses corresponding to the edges of our search window (∆m = 0 or ∆m ≥ 50 ps −1 ). Both would result in almost flat asymmetry curves (cf. Fig. 7) corresponding to no observed oscillations. We reject the null hypothesis of no oscillations by the equivalent of 5.8 standard deviations for B 0 s oscillations and 13.0 standard deviations for B 0 oscillations.

Fourier analysis
The full fit as described above was performed in the time domain, but measurement of the mixing frequency can also be made directly in the frequency domain as a cross-check, using well-established Fourier transform techniques [27][28][29]. The cosine term in Eq. 3 has a different sign for the odd and even samples, where the lifetime, acceptance, and other features are shared; this simplifies the analysis in the frequency domain. Any Fourier components not arising from mixing are suppressed by subtracting the odd Fourier spectrum from the even spectrum and no parameterizations of the background shapes, signal shapes, or decay-time resolution are required, allowing a model-independent measurement of the mixing frequencies. We search for the ∆m s peak in the subtracted Fourier spectrum, shown in Fig. 8. Extensive fast simulation pseudo-experiments have shown that the value of ∆m s is obtained reliably and with a reasonable precision using this method; however ∆m d is heavily biased and has a large uncertainty, and so a result is not quoted. Since residual components of the Fourier spectrum are of much lower frequency than the ∆m s component, and several complete oscillation periods of ∆m s are observable, the search for a spectral peak is relatively free from complications. For ∆m d , however, the relatively low frequency is similar to that of many other features of the data, and only a single oscillation period is observed; therefore the determination of ∆m d is difficult with this simple model-independent approach. is constructed from bins of the K + K − π + mass which are 25 MeVc −2 in width, analysed in steps of 5 MeVc −2 such that a smooth image is produced. The colour scale (blue-green-yellow-red) is an arbitrary linear representation of the signal intensity; dark blue is used for zero and below. The vertical dashed line is drawn at 18.0 ps −1 . The apparent double-peak structure is an artifact of this image. On the right a slice around the D + s mass region shows only the peak as used to measure the central value and rms width.
Taking the spectrum for events in a 25 MeVc −2 bin around the D + s mass, we find a clear and separated peak (Fig. 8, right). The rms width of the peak is 0.4 ps −1 , around a peak value of 17.95 ps −1 ; the rms can be used as a model-independent proxy for the statistical uncertainty. To further evaluate the expected statistical fluctuation in the peak value, we perform a large set of fast simulation pseudo-experiments taking the result of the multivariate fit as a model for signal and background. The uncertainty found from the simulation studies is 0.32 ps −1 , slightly smaller than given by the rms. We report ∆m s = (17.95 ± 0.40 (rms) ± 0.11 (syst)) ps −1 , in order to be model-independent. Systematic uncertainties arise from the detector alignment and the k-factor correction method, common to both measurement techniques, as quantified previously in Sec. 6.

Conclusion
The mixing frequencies for neutral B mesons have been measured using flavour-specific semileptonic decays. To correct for the momentum lost to missing particles, a simulationbased kinematic correction, known as the k-factor, was adopted. Two techniques were used to measure the mixing frequencies: a multidimensional simultaneous fit to the K + K − π + mass distribution, the decay-time distribution, and tagging information; and a simple Fourier analysis. The results of the two methods were consistent, with the first method being more precise. We obtain ∆m s = (17.93 ± 0.22 (stat) ± 0.15 (syst)) ps −1 , ∆m d = (0.503 ± 0.011 (stat) ± 0.013 (syst)) ps −1 .
We reject the hypothesis of no oscillations by 5.8 standard deviations for B 0 s and 13.0 standard deviations for B 0 . This is the first observation of B 0 s -B 0 s mixing to be made using only semileptonic decays.