Test of lepton universality with $B^{0} \rightarrow K^{*0}\ell^{+}\ell^{-}$ decays

A test of lepton universality, performed by measuring the ratio of the branching fractions of the B$^{0}$ → K$^{*0}$ μ$^{+}$ μ$^{−}$ and B$^{0}$ → K$^{*0}$ e$^{+}$ e$^{−}$ decays, $ {R}_{K^{*0}} $ , is presented. The K$^{*0}$ meson is reconstructed in the final state K$^{+}$ π$^{−}$, which is required to have an invariant mass within 100 MeV/c$^{2}$ of the known K$^{*}$(892)$^{0}$ mass. The analysis is performed using proton-proton collision data, corresponding to an integrated luminosity of about 3 fb$^{−1}$, collected by the LHCb experiment at centre-of-mass energies of 7 and 8 TeV. The ratio is measured in two regions of the dilepton invariant mass squared, q$^{2}$, to be $ {R}_{K^{*0}}=\left\{\begin{array}{l}{0.66_{-}^{+}}_{0.07}^{0.11}\left(\mathrm{stat}\right)\pm 0.03\left(\mathrm{syst}\right)\kern1em \mathrm{f}\mathrm{o}\mathrm{r}\kern1em 0.045<{q}^2<1.1\kern0.5em {\mathrm{GeV}}^2/{c}^4,\hfill \\ {}{0.69_{-}^{+}}_{0.07}^{0.11}\left(\mathrm{stat}\right)\pm 0.05\left(\mathrm{syst}\right)\kern1em \mathrm{f}\mathrm{o}\mathrm{r}\kern1em 1.1<{q}^2<6.0\kern0.5em {\mathrm{GeV}}^2/{c}^4.\hfill \end{array}\right. $


Introduction
In the Standard Model (SM) of particle physics, the electroweak couplings of leptons to gauge bosons are independent of their flavour and the model is referred to as exhibiting lepton universality (LU). Flavour-changing neutral-current (FCNC) processes, where a quark changes its flavour without altering its electric charge, provide an ideal laboratory to test LU. The SM forbids FCNCs at tree level and only allows amplitudes involving electroweak loop (penguin and box) Feynman diagrams. The absence of a dominant tree-level SM contribution implies that such transitions are rare, and therefore sensitive to the existence of new particles. The presence of such particles could lead to a sizeable increase or decrease in the rate of particular decays, or change the angular distribution of the final-state particles. Particularly sensitive probes for such effects are ratios of the type [1] where H represents a hadron containing an s quark, such as a K or a K * meson. The decay rate, Γ, is integrated over a range of the squared dilepton invariant mass, q 2 . The R H ratios allow very precise tests of LU, as hadronic uncertainties in the theoretical predictions cancel, and are expected to be close to unity in the SM [1][2][3]. At e + e − colliders operating at the Υ (4S) resonance, the ratios R K ( * ) have been measured to be consistent with unity with a precision of 20 to 50% [4,5]. More recently, the most precise determination to date of R K in the q 2 range between 1.0 and 6.0 GeV 2 /c 4 has been performed by the LHCb collaboration. The measurement has a relative precision of 12% [6] and is found to be 2.6 standard deviations lower than the SM expectation [1]. Hints of LU violation have been observed in B → D ( * ) ν decays [7][8][9]. Tensions with the SM have also been found in several measurements of branching fractions [10][11][12] and angular observables [13,14] of rare b → s decays. Models containing a new, neutral, heavy gauge boson [15][16][17][18][19][20] or leptoquarks [21,22] have been proposed to explain these measurements.
A precise measurement of R K * 0 can provide a deeper understanding of the nature of the present discrepancies [23]. Some of the leading-order Feynman diagrams for the B 0 → K * 0 + − decays, where represents either a muon or an electron, are shown in figure 1 for both SM and possible New Physics (NP) scenarios. If the NP particles couple differently to electrons and muons, LU could be violated. The K * 0 represents a K * (892) 0 meson, which is reconstructed in the K + π − final state by selecting candidates within 100 MeV/c 2 of the known mass [24]. No attempt is made to separate the K * 0 meson from S-wave or other broad contributions present in the selected K + π − region. The S-wave fraction contribution to the B 0 → K * 0 µ + µ − mode has been measured by the LHCb collaboration and found to be small [25]. Inclusion of charge-conjugate processes is implied throughout the paper, unless stated otherwise. The analysis is performed in two regions of q 2 that are sensitive to different NP contributions: a low-q 2 bin, between 0.045 and 1.1 GeV 2 /c 4 , and a central-q 2 bin, between 1.1 and 6.0 GeV 2 /c 4 . The lower boundary of the low-q 2 region corresponds roughly to the dimuon kinematic threshold. The boundary at 1.1 GeV 2 /c 4 is chosen such that φ(1020) → + − decays, which could potentially dilute NP effects, are included in the low-q 2 interval. The upper boundary of the central-q 2 bin at 6.0 GeV 2 /c 4 is chosen to reduce contamination from the radiative tail of the J/ψ resonance.
The measurement is performed as a double ratio of the branching fractions of the B 0 → K * 0 + − and B 0 → K * 0 J/ψ (→ + − ) decays , where the two channels are also referred to as the "nonresonant" and the "resonant" modes, respectively. The experimental quantities relevant for the measurement are the yields and the reconstruction efficiencies of the four decays entering in the double ratio. Due to the similarity between the experimental efficiencies of the nonresonant and resonant decay modes, many sources of systematic uncertainty are substantially reduced. This helps to mitigate the significant differences in reconstruction between decays with muons or electrons in the final state, mostly due to bremsstrahlung emission and the trigger response. The decay J/ψ → + − is measured to be consistent with LU [24]. In order to avoid experimental biases, a blind analysis was performed. The measurement is corrected for final-state radiation (FSR). Recent SM predictions for R K * 0 in the two q 2 regions are reported in table 1. Note that possible uncertainties related to QED corrections are only included in Ref. [26], and these are found to be at the percent level. The R K * 0 ratio is smaller than unity in the low-q 2 region due to phase-space effects. The remainder of this paper is organised as follows: section 2 describes the LHCb detector, as well as the data and the simulation samples used; the experimental challenges in studying electrons as compared to muons are discussed in section 3; section 4 details EOS [30,31] 0.9964 ± 0.005 flav.io [32][33][34] 0.996 ± 0.002 JC [35] how the simulation is adjusted in order to improve the modelling of the data; the selection of the candidates, rejection of the background and extraction of the yields are outlined in sections 5, 6 and 7; section 8 discusses the efficiency determination; the cross-checks performed and the systematic uncertainties associated with the measurement are summarised in sections 9 and 10, respectively; the results are presented in section 11; and section 12 presents the conclusions of the paper.

The LHCb detector and data set
The LHCb detector [36,37] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed to study particles containing b or c quarks. The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream of the magnet. The tracking system provides a measurement of momentum, p, with a relative uncertainty that varies from 0.5% at low values to 1.0% at 200 GeV/c. The minimum distance of a track to a primary vertex (PV), the impact parameter (IP), is measured with a resolution of (15 + 29/p T ) µm, where p T is the component of the momentum transverse to the beam, in GeV/c. Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter (ECAL) and a hadronic calorimeter (HCAL). Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers.
The trigger system consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction. The hardware muon trigger selects events containing at least one muon with significant p T (from ∼ 1.5 to ∼ 1.8 GeV/c, depending on the data-taking period). The hardware electron trigger requires the presence of a cluster of calorimeter cells with significant transverse energy, E T , (from ∼ 2.5 to ∼ 3.0 GeV, depending on the data-taking period) in the ECAL. The hardware hadron trigger requires the presence of an energy deposit with E T above ∼ 3.5 GeV in the calorimeters. The software trigger requires a two-, three-or four-track secondary vertex, with a significant displacement from the PV. At least one charged particle must have significant p T and be inconsistent with originating from any PV. A multivariate algorithm [38] is used for the identification of secondary vertices consistent with the decay of a b hadron.
The analysis is based on pp collision data collected with the LHCb detector at centreof-mass energies of 7 and 8 TeV during 2011 and 2012, and corresponding to an integrated luminosity of about 3 fb −1 . Samples of simulated ) events are used to determine the efficiency to trigger, reconstruct and select signal events, as well as to model the shapes used in the fits for signal candidates. In addition, specific simulated samples are utilised to estimate the contributions from backgrounds and to model their mass distributions. The pp collisions are generated using Pythia [39] with a specific LHCb configuration [40]. Decays of hadronic particles are described by EvtGen [41], in which FSR is generated using Photos [42], which is observed to agree with a full QED calculation at the level of ∼ 1% [26]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [43] as described in Ref. [44].

Electron reconstruction effects
The experimental environment in which the LHCb detector operates leads to significant differences in the treatment of decays involving muons or electrons in the final state. The two types of leptons behave differently when travelling through the detector material. Electrons emit a much larger amount of bremsstrahlung which, if not accounted for, would result in a significant degradation of the momentum resolution and consequently in a degradation of the B mass resolution. If the radiation occurs downstream of the dipole magnet, the photon energy is deposited in the same calorimeter cell as that of the lepton, and the momentum of the electron is correctly measured. If the photons are emitted upstream of the magnet, the electron and photon deposit their energy in different calorimeter cells, and the electron momentum is evaluated after bremsstrahlung emission. However, for both types of emissions, the ratio of the energy detected in the ECAL to the momentum measured by the tracking system, an important variable to identify electrons, remains unbiased.
A dedicated bremsstrahlung recovery procedure is used to improve the electron momentum reconstruction. Searches are made within a region of the ECAL defined by the extrapolation of the electron track upstream of the magnet for energy deposits with E T > 75 MeV that are not associated with charged tracks. Such "bremsstrahlung clusters" are added to the measured electron momentum. If the same cluster can be associated with both the e + and the e − , its energy is added to one of the two electrons at random. In B 0 → K * 0 J/ψ (→ e + e − ) decays, one bremsstrahlung cluster is added to either electron of the pair in about half of the cases; the remaining half is equally split between cases when no bremsstrahlung cluster is found, or two or more clusters are added. These fractions are reproduced well by the simulation and depend only weakly on q 2 . The bremsstrahlung recovery procedure is limited in three ways: the energy threshold of the clusters that are added; the calorimeter acceptance and resolution; and the presence of energy deposits wrongly interpreted as bremsstrahlung clusters. These limitations degrade the resolution of the reconstructed invariant masses of both the dielectron pair and the B candidate.
Since the occupancy of the calorimeters is significantly higher than that of the muon stations, the constraints on the trigger rate require that higher thresholds are imposed on the electron E T than on the muon p T . In the central-q 2 region the higher threshold causes a loss of about half of the electron signal. The efficiency decreases slightly at lower q 2 values. To partially mitigate this effect, decays with electrons in the final state can also be selected through the hadron hardware trigger, using clusters associated with the K * 0 decay products, or by any hardware trigger from particles in the event that are not associated with the signal candidate.
In decays with electrons, since the mass resolution of the reconstructed B candidate is worse than in final states with muons, the background contamination in the signal region is larger. The level of combinatorial background, arising from the accidental association of particles produced by different b-and c-hadron decays, is also higher in such channels, due to a larger number of electron candidates. As a result, the discriminating power of the fits to extract the signal yields is reduced (see section 7). Differences due to bremsstrahlung and the trigger response lead to a reconstruction efficiency for the B 0 → K * 0 J/ψ (→ e + e − ) decays that is about five times smaller than for the B 0 → K * 0 J/ψ (→ µ + µ − ) decays.

Corrections to the simulation
In order to optimise the selection criteria and accurately evaluate the efficiencies, a set of corrections is determined from unbiased control samples selected from the data. The procedure is applied to the simulated samples of the nonresonant and resonant modes.
The first correction accounts for differences between simulation and data in the particle identification (PID) performance [45]. The PID efficiencies are directly measured using a tag-and-probe method on high-purity data samples of pions and kaons from D * + → D 0 (→ K − π + )π + decays. Similarly, the electron and muon identification efficiencies are obtained from B + → K + J/ψ (→ + − ) decays. Corrections are determined as a function of the track momentum and pseudorapidity.
The second step of the procedure adjusts the simulation for the charged-track multiplicity in the event, which is not described well in simulation. A small correction for the B 0 kinematics is also applied. Resonant B 0 → K * 0 J/ψ (→ µ + µ − ) decays are used since the muon triggers are observed to be well modelled in simulation.
The third step corrects the simulation of the trigger response for both the hardware and software levels using a tag-and-probe technique. Whenever possible, B 0 → K * 0 J/ψ (→ µ + µ − ) decays are used as a control sample in place of B 0 → K * 0 J/ψ (→ e + e − ) decays in order to take advantage of the larger sample size. In such cases, the two decays are compared and found to give consistent results. The tag sample is defined by events where the hardware trigger is fired by activity in the event not associated with any of the signal decay particles. Alternatively, when probing the leptonic (hadronic) hardware triggers, the tag is required to have triggered the hadronic (leptonic) hardware trigger. The corrections for the leptonic hardware triggers are parameterised as a function of the cluster E T or track p T . The hadron hardware trigger efficiency is known to be sensitive to tracks overlapping in the HCAL, however, a good description can be obtained when the efficiency is measured as a function of the p T of the K + π − pair instead of the kaon or the pion independently. Corrections are determined separately in the different calorimeter regions [36], in order to take into account potential differences due to different occupancies. When the hardware trigger is fired by activity in the event not associated with any of the signal decay particles, the correction is determined as a function of the B 0 p T and the charged-track multiplicity in the event in order to take into account correlations in the production between the two b hadrons in the event. For the software trigger, the corrections are determined as a function of the minimum p T of the B 0 decay products.
Finally, residual differences between data and simulation in the reconstruction performance are accounted for using B 0 → K * 0 J/ψ (→ + − ) candidates to which the full selection is applied, as well as additional requirements to further reduce the background contamination. The corrections are determined by matching the distribution of the B 0 kinematics and vertex fit quality in simulation to the data, separately for muon and electron samples.
The correction factors are determined sequentially as histograms, with the previous corrections applied before deriving the subsequent one. To avoid biases in the procedure due to common candidates being used for both the determination of the corrections and the measurement, a k-folding [46] approach with k = 10 is adopted. To dilute the dependence on the choice of the binning schemes, all corrections are linearly interpolated between adjacent bins. After all the corrections are applied to the simulation, a very good agreement with the data is obtained.

Selection of signal candidates
A B 0 candidate is formed from a pair of well-reconstructed oppositely charged particles identified as either muons or electrons, combined with two well-reconstructed oppositely charged particles, one identified as a kaon and the other as a pion. The K + π − invariant mass is required to be within 100 MeV/c 2 of the known K * 0 mass. The kaon and pion must have p T exceeding 250 MeV/c, while for the muons (electrons) p T > 800 (500) MeV/c is required. Only dilepton pairs with a good-quality vertex are used to form signal candidates. The K * 0 meson and + − pair are required to originate from a common vertex in order to form a B 0 candidate. When more than one PV is reconstructed, the one with the smallest χ 2 IP is selected, where χ 2 IP is the difference in χ 2 of a given PV reconstructed with and without the considered B 0 candidate. With respect to this selected PV, the impact parameter of the B 0 candidate is required to be small, its decay vertex significantly displaced, and the momentum direction of the B 0 is required to be consistent with its direction of flight. This direction is given by the vector between the PV and decay vertex. The distribution of q 2 as a function of the four-body invariant mass for the B 0 candidates is shown in figure 2 for both muon and electron final states. The requirements on the neural-network classifier and m corr (see section 5) are not applied. In each plot, the contributions due to the charmonium resonances are clearly visible at the J/ψ and ψ(2S) masses. For electrons, these distributions visibly extend above the nominal mass values due to the calorimeter resolution affecting the bremsstrahlung recovery procedure (see section 3). The empty region in the top left corresponds to the kinematic limit of the B 0 → K * 0 + − decay, while the empty region in the top right corresponds to the requirement that rejects the B + → K + + − background (see section 6). The B 0 mass resolution and the contributions of signal and backgrounds depend on the way in which the event was triggered. The data sample of decay modes involving an e + e − pair is therefore divided into three mutually exclusive categories, which in order of precedence are: candidates for which one of the electrons from the B 0 decay satisfies the hardware electron trigger (L0E), candidates for which one of the hadrons from the K * 0 decay meets the hardware hadron trigger (L0H) requirements, and candidates triggered by activity in the event not associated with any of the signal decay particles (L0I). For B 0 → K * 0 µ + µ − candidates, at least one of the two leptons must satisfy the requirements of the hardware muon trigger.
For the B 0 → K * 0 J/ψ (→ µ + µ − ) decay mode, a dimuon mass interval within 100 MeV/c 2 of the known J/ψ mass is selected to identify candidates. It is not possible to apply a tight q 2 requirement to identify the B 0 → K * 0 J/ψ (→ e + e − ) mode as, despite the bremsstrahlung recovery, the e + e − invariant mass distribution has a long radiative tail towards low values. This tail can be seen in figure 2. The q 2 interval used to select B 0 → K * 0 J/ψ (→ e + e − ) candidates is between 6.0 and 11.0 GeV 2 /c 4 , with the lower limit corresponding to the upper boundary of the central-q 2 bin.
The separation of the signal from the combinatorial background is based on neuralnetwork classifiers [47]. The same classifier is used for the resonant and nonresonant modes, but muon and electron channels are treated separately. The classifiers are trained using simulated B 0 → K * 0 + − decays, which have been corrected for known differences between data and simulation (see section 4), to represent the signal. Data candidates with K + π − + − invariant masses larger than 5400 MeV/c 2 and 5600 MeV/c 2 are used to represent background samples for the muon and electron channel, respectively. To best exploit the size of the available data sample for the training procedure, a k-folding technique [46] is adopted with k = 10. The variables used as input to the classifiers are: the transverse momentum, the quality of the vertex fit, the χ 2 IP , the χ 2 VD (the χ 2 on the measured distance between the PV and the decay vertex), and the angle between the direction of flight and the momentum of the B 0 candidate, the K + π − and the dilepton pairs; the minimum and maximum of the kaon and pion p T , and of their χ 2 IP ; the minimum and maximum of the lepton p T values, and of their χ 2 IP ; and finally, the most discriminating variable, the quality of the kinematic fit to the decay chain (this fit is performed with a constraint on the vertex that requires the B 0 candidate to originate from the PV). In each fold, only variables that significantly improve the discriminating power of the classifier are kept.
For the muon modes, a requirement on the four-body invariant mass of the B 0 candidate to be larger than 5150 MeV/c 2 excludes backgrounds due to partially reconstructed decays, B → K * 0 µ + µ − X, where one or more of the products of the B decay, denoted as X, are not reconstructed. A kinematic fit that constrains the dielectron mass to the known J/ψ mass allows the corresponding background to be separated from the B 0 → K * 0 J/ψ (→ e + e − ) signal by requiring the resulting four-body invariant mass to be at least 5150 MeV/c 2 . For the nonresonant electron mode, the partially reconstructed backgrounds can be reduced by exploiting the kinematics of the decay. The ratio of the K * 0 and the dielectron momentum components transverse to the B 0 direction of flight is expected to be unity, unless the electrons have lost some energy due to bremsstrahlung that was not recovered (see figure 3). In the approximation that bremsstrahlung photons do not modify the dielectron direction significantly, which is particularly valid for low dilepton masses, this ratio can be used to correct the momentum of the dielectron pair. The invariant mass of the signal candidate calculated using the corrected dielectron momentum, m corr , has a poor resolution that depends on χ 2 VD . Nevertheless, since the missing momentum of background candidates does not originate from the dielectron pair, m corr still acts as a useful discriminating variable. Signal and partially reconstructed backgrounds populate different regions of the two-dimensional plane defined by m corr and χ 2 VD (see figure 4). The requirements in this plane and on the classifier response are optimised simultaneously, but separately for each q 2 region. The optimisation maximises a figure of merit defined as N S / √ N S + N B , where the expected signal yield, N S , is evaluated by scaling the observed number of B 0 → K * 0 J/ψ (→ + − ) candidates by the ratio of the branching fractions of the nonresonant and resonant modes, and the expected background yield, N B , is obtained by fitting the mass sidebands in data.
After the full selection, 1 to 2% of the events contain multiple candidates. This fraction is consistent between the resonant and nonresonant modes, and between final states with electrons and muons. About half of the multiple candidates are due to cases where the kaon is misidentified as the pion and vice versa. In all cases only one candidate, chosen randomly, is retained.

Exclusive backgrounds
Specific requirements are applied to reject backgrounds from b-hadron decays, while ensuring a negligible loss of signal, as verified using simulation. In the low-q 2 region, the size of the contamination from B 0 → K * 0 V (→ + − ) decays, where V is a ρ, ω or φ meson, is evaluated in Refs. [48,49]. The contamination due to direct decays or interference with the signal channel is found to be smaller than 2% and similar for muons and electrons. As a consequence, the residual effect in the double ratio is expected to be very small and can therefore be safely neglected.
Misreconstructed B 0 → K * 0 J/ψ (→ µ + µ − ) and B 0 → K * 0 ψ(2S)(→ µ + µ − ) decays can  contaminate the signal region if the identities of one of the hadrons and one of the muons are swapped. To avoid this, the invariant mass of the hadron candidate (under the muon mass hypothesis) and the oppositely charged muon is required to be outside of a 60 MeV/c 2 interval around the known J/ψ or the ψ(2S) masses.
A large, nonpeaking background comes from the B 0 → D − + ν decay, with D − → K * 0 − ν, which has a branching fraction four orders of magnitude larger than that of the signal. In the rare case where both neutrinos have low energies, the signal selection will be less effective at rejecting this background. This decay can be separated from the signal by exploiting the angular distribution of the dilepton pair. For B 0 → D − + ν decays, the angle θ between the direction of the + in the dilepton rest frame and the direction of the dilepton in the B 0 rest frame tends to be small. This background is suppressed by requiring | cos θ | < 0.8.
When combined with a low-momentum π − meson from the rest of the event, B + → K + + − decays can pass the selection and populate the upper mass sideband region that is used to represent the combinatorial background for the training of the neural-network classifiers. Such decays are vetoed by requiring the invariant mass of the K + + − combination to be less than 5100 MeV/c 2 . Candidates where the π − from the K * 0 is misidentified as a kaon and paired with a π + are similarly rejected. To suppress background from B 0 s → φ + − decays, with φ → K + K − where one of the kaons is misidentified as a pion, the invariant mass of the two hadrons computed under the K + K − mass hypothesis is required to be larger than 1040 MeV/c 2 . 7 Fits to the K + π − + − invariant mass distributions The signal yields are determined using unbinned extended maximum likelihood fits to the four-body invariant mass, m(K + π − + − ), of the selected candidates in each q 2 interval and for each lepton type. The reconstructed invariant mass is calculated using a kinematic fit with a constraint on the vertex that requires the B 0 candidate to originate from the PV. In order to improve the quality and stability of the results, the fits are performed simultaneously on the nonresonant and resonant modes, and some parameters are shared.
For the muon channel, the fit is performed in an invariant mass window of 5150-5850 MeV/c 2 . The low edge is chosen to reject the partially reconstructed background that populates the low mass region. The probability density function (PDF) for the signal is defined by a Hypatia function [50], where the parameters are fixed from simulation. However, in order to account for possible residual discrepancies with data, the mean and width are allowed to vary freely in the fit, independently for the resonant and nonresonant modes and in each q 2 region. The combinatorial background is parameterised using an exponential function, which has a different slope in the resonant and nonresonant modes, and in each q 2 region, that is free to vary in the fit. For the resonant mode, two additional sources of background are included: Λ 0 b → K + pJ/ψ (→ µ + µ − ) decays, where the p candidate is misidentified as a π − meson, and B 0 s → K * 0 J/ψ (→ µ + µ − ) decays. The former are described using a kernel estimation technique [51] applied to simulated events for which the K + π − invariant mass distribution has been matched to data from Ref. [52]. The latter are modelled using the same PDF as for the signal, but with the mean value shifted by the known difference between the B 0 and the B 0 s masses. The equivalent backgrounds to the nonresonant mode are found to be negligible.
For the electron channel, due to the limited resolution on the K + π − e + e − invariant mass, a wider window of 4500-6200 MeV/c 2 is used. The resolution on the reconstructed invariant mass of the B 0 and the background composition depends on the kinematics of the decay, as well as on the trigger category. For this reason, simultaneous fits to the four-body invariant mass of the B 0 → K * 0 J/ψ (→ e + e − ) and B 0 → K * 0 e + e − channels are performed separately in the three trigger categories. Following the strategy of Ref. [6], the K + π − e + e − signal PDF is observed to depend on the number of calorimeter clusters that are added to the dielectron candidate in order to correct for the effects of bremsstrahlung. Three bremsstrahlung categories are considered, depending on whether zero, one or more clusters are recovered. The PDF is described by the sum of a Crystal Ball function [53] (CB) and a wide Gaussian function. The CB function accounts for FSR and bremsstrahlung that is not fully recovered, and corresponds to over 90% of the total signal PDF. Cases where bremsstrahlung clusters were incorrectly associated are accounted for by the Gaussian function. The shape parameters and the fraction of candidates in each bremsstrahlung category are taken from simulation, the latter having been checked on data control channels (see figure 5). In order to account for possible data-simulation discrepancies, the mean (width) of the PDF for each trigger category is allowed to shift (scale). These shift and scale factors are common between the nonresonant and resonant PDFs. An additional scale factor is also applied to the parameter describing the tail of the CB functions. The combinatorial background is described by an exponential function with different slope parameters for the resonant and nonresonant modes, and in each trigger category and q 2 region, that are free to vary in the fit. The shape of the partially reconstructed hadronic background, B → X(→ Y K * 0 )e + e − (where the decay product Y is not reconstructed), is obtained from simulation using a sample that includes decays of higher kaon resonances, X, such as K + 1 (1270) and K * + 2 (1430). The mass distribution is modelled using a kernel estimation technique separately in each trigger category and q 2 region. The fraction of this background is free to vary in both q 2 intervals. Due to the requirement on the four-body invariant mass with a J/ψ mass constraint (see section 5), there is no partially reconstructed background left to contaminate B 0 → K * 0 J/ψ (→ e + e − ) candidates. Due to the long radiative tail of the dielectron invariant mass, B 0 → K * 0 J/ψ (→ e + e − ) decays can contaminate the central-q 2 region and an additional background component is considered (see figure 2), however this contribution does not peak at the nominal B 0 mass. The distribution is modelled using simulated events, while the normalisation is constrained using a mixture of data and simulation. The contributions to the resonant modes from Λ 0 b → K + pJ/ψ (→ e + e − ) and B 0 s → K * 0 J/ψ (→ e + e − ) decays are treated following the same procedure as for the muon channel. The normalisations are fixed to the yields returned by the muon fit after correcting for efficiency differences between the two final states.
The results of the fits to the muon channels are shown in figure 6, while figure 7 displays the fit results for the electron channels, where the three trigger categories have been combined. The distribution of the normalised fit residuals of the B 0 → K * 0 J/ψ (→ µ + µ − ) mode shows an imperfect description of the combinatorial background at high mass values, although the effect on the signal yield is negligible. The resulting yields are listed in table 2.

Efficiencies
The efficiency for selecting each decay mode is defined as the product of the efficiencies of the geometrical acceptance of the detector, the complete reconstruction of all tracks, the trigger requirements and the full set of kinematic, PID and background rejection requirements. All efficiencies are determined using simulation that is tuned to data, as described in section 4, and account for bin migration in q 2 due to resolution, FSR and bremsstrahlung in the detector. The net bin migration amounts to about 1% and 5% in  LHCb the low-and central-q 2 regions, respectively. The efficiency ratios between the nonresonant and the resonant modes, ε + − /ε J/ψ ( + − ) , which directly enter in the R K * 0 measurement, are reported in table 3. Besides a dependence on the kinematics, the difference between the ratios in the two q 2 regions is almost entirely due to the different requirement on the neural-network classifier. The relative fraction of the electron trigger categories is checked using simulation to depend on q 2 as expected: the fraction of L0E decreases when decreasing in q 2 , while L0H increases; on the other hand, the fraction of L0I only mildly depends on q 2 .

Cross-checks
A large number of cross-checks were performed before unblinding the result. The control of the absolute scale of the efficiencies is tested by measuring the ratio of the branching fractions of the muon and electron resonant channels which is expected to be equal to unity. This quantity represents an extremely stringent test, as it does not benefit from the large cancellation of the experimental systematic Table 3: Efficiency ratios between the nonresonant and resonant modes, ε + − /ε J/ψ ( + − ) , for the muon and electron (in the three trigger categories) channels. The uncertainties are statistical only. e + e − (L0I) 0.789 ± 0.029 0.595 ± 0.020 effects provided by the double ratio. The r J/ψ ratio is measured to be 1.043 ± 0.006 ± 0.045, where the first uncertainty is statistical and the second systematic. The same sources of systematic uncertainties as in the R K * 0 measurement are considered (see section 10). The result, which is in good agreement with unity, is observed to be compatible with being independent of the decay kinematics, such as p T and η of the B 0 candidate and final-state particles, and the charged-track multiplicity in the event.
The extent of the cancellation of residual systematics in R K * 0 is verified by measuring a double ratio, R ψ(2S) , where B 0 → K * 0 ψ(2S)(→ + − ) decays are used in place of B 0 → K * 0 + − . The R ψ(2S) ratio, measured with a statistical precision of about 2%, is found to be compatible with unity within one standard deviation.
The branching fraction of the decay B 0 → K * 0 µ + µ − is measured and found to be in good agreement with Ref. [25]. Furthermore, the branching fraction of the B 0 → K * 0 γ decay, where decays with a photon conversion are used, is determined with a statistical precision of about 7% and is observed to be in agreement with the expectation within two standard deviations. The B 0 → K * 0 γ(→ e + e − ) selection and determination of the signal yield closely follows that of the B 0 → K * 0 e + e − decay.
If no correction is made to the simulation, the ratio of the efficiencies changes by less than 5%. The relative population of the three bremsstrahlung categories is compared between data and simulation using both B 0 → K * 0 J/ψ (→ e + e − ) and B 0 → K * 0 γ(→ e + e − ) candidates to test possible q 2 dependence of the modelling. Good agreement is observed, as shown in figure 5.
The sPlot technique [54], where m(K + π − + − ) is used as the discriminating variable, is adopted to subtract statistically the background from the B 0 → K * 0 + − selected data, and test the agreement between muons and electrons, data and simulation, using several control quantities (see figure 8): the q 2 distributions show good agreement in both q 2 regions; a clear K * 0 peak is visible in the K + π − invariant mass distributions, and the muon and electron channels show good agreement; while the distribution of the opening angle between the two leptons in the central-q 2 region are very similar between the muon and electron channels, this is not the case at low-q 2 due to the difference in lepton masses; the distribution of the distance between the K + π − and + − vertices shows that the pairs of hadrons and leptons consistently originate from the same decay vertex.  Figure 8: (hatched) Background-subtracted distributions for (darker colour) B 0 → K * 0 µ + µ − and (lighter colour) B 0 → K * 0 e + e − candidates, compared to (full line) simulation. From top to bottom: q 2 , K + π − invariant mass, m(K + π − ), opening angle between the two leptons, θ lepton , and projection along the beam axis of the distance between the K + π − and + − vertices, ∆z vertex . The distributions are normalised to unity. The hatched areas correspond to the statistical uncertainties only. The data are not efficiency corrected. Table 4: Systematic uncertainties on the R K * 0 ratio for the three trigger categories separately (in percent). The total uncertainty is the sum in quadrature of all the contributions.

Systematic uncertainties
Since R K * 0 is measured as a double ratio, many potential sources of systematic uncertainty cancel. The remaining systematics and their effects on R K * 0 are summarised in table 4 and are described below.
Corrections to simulation: the uncertainty induced by the limited size of the simulated sample used to compute the efficiencies is considered; an additional systematic uncertainty is determined using binned corrections instead of interpolated ones; finally, since the data samples used to determine the corrections have a limited size, particularly for the electron hardware trigger, a systematic uncertainty is assessed with a bootstrapping technique [55].
Trigger efficiency: for the hardware triggers, the corrections to the simulation are determined using different control samples and the change in the result is assigned as a systematic uncertainty; for the software trigger, the corrections to the simulation do not show dependences on the kinematic of the decays, and therefore only the statistical uncertainty on the overall correction is considered as a systematic uncertainty.
Particle identification: the particle identification response is calibrated using data; a systematic uncertainty due to the procedure and kinematic differences between these control samples and the signal modes is included; the effects due to the identification of leptons and hadrons are considered; however, discrepancies in the description of the latter are small and further cancel in the double ratio.
Kinematic selection: a systematic uncertainty due to the choice of the mass fit range and to the two-dimensional requirement on χ 2 VD and m corr is determined by comparing the efficiencies in simulation and background-subtracted samples of Residual background: background due to B 0 → K * 0 J/ψ (→ e + e − ) decays where one of the hadrons is misidentified as an electron and vice versa is studied; using simulation that is tuned to data (see section 4) this contribution is estimated to be small; however, a few candidates with one electron of the dilepton pair having a low probability to be genuine are observed in background subtracted data; a systematic uncertainty is assigned based on the distribution of the PID information of these candidates.
Mass fit: the systematic uncertainty due to the parameterisation of the signal invariant mass distributions is found to be negligible for the muon channel; for the electron channel, the signal PDF is changed from the sum of a CB and a Gaussian function to the sum of two CB functions, where the mean parameter is shared and, additionally, the mass shift and the width scale factors are constrained using the B 0 → K * 0 γ(→ e + e − ) decay mode instead of B 0 → K * 0 J/ψ (→ e + e − ); the relative fractions of the three bremsstrahlung categories are measured in data using B 0 → K * 0 J/ψ (→ e + e − ) and the observed differences with respect to simulation are used in the mass fit (see figure 5); for the backgrounds, a component that describes candidates where the hadron identities are swapped is added both to the muon and electron B 0 → K * 0 J/ψ (→ + − ) modes, and constrained to the expected values observed in simulation; the kernel of the nonparametric models is also varied, as well as the mixture of the K + 1 (1270) and K * + 2 (1430) components that is constrained using data [56]; the contributions to the systematic uncertainty from these sources are evaluated using pseudoexperiments that are generated with modified parameters and fitted with the PDFs used to fit the data.
Bin migration: for the electron channel, the degraded q 2 resolution due to bremsstrahlung emission causes a nonnegligible fraction of signal candidates to migrate in and out of the given q 2 bin; the effect is included in the efficiency determination, but introduces a small dependence on the shape of the differential branching fraction that no longer perfectly cancels in the ratio to the muon channel; pseudoexperiments are generated, where the parameters modelling the dΓ(B 0 → K * 0 e + e − )/dq 2 distribution are varied within their uncertainties [34]; the maximum spread of the variation in R K * 0 is taken as a systematic uncertainty; furthermore, the q 2 resolution is smeared for differences between data and simulation that are observed in the resonant mode.
r J/ψ ratio: the ratio of the efficiency-corrected yield of the resonant modes (see section 9) is expected to be unity to a very high precision; deviations from unity are therefore considered to be a sign of residual imperfections in the evaluation of the efficiencies; the r J/ψ ratio is studied as a function of various event and kinematic properties of the decay products, and the observed residual deviations from unity are used to assign a systematic uncertainty on R K * 0 .
For the R K * 0 measurement, all the uncertainties are treated as uncorrelated among the trigger categories, except for those related to particle identification, to the kinematic selection criteria, to the residual background, to the fit to the invariant mass and to bin migration.

Results
The determination of R K * 0 exploits the log-likelihoods resulting from the fits to the invariant mass distributions of the nonresonant and resonant channels in each trigger category and q 2 region. Each log-likelihood is used to construct the PDF of the true number of decays, which is used as a prior to obtain the PDF of R K * 0 . The true number of decays is assumed to have a uniform prior. The three electron trigger categories are combined by summing the corresponding log-likelihoods. Uncorrelated systematic uncertainties are accounted for by convolving the yield PDFs with a Gaussian distribution of appropriate width. Correlated systematic uncertainties are treated by convolving the R K * 0 PDF with a Gaussian distribution. The one, two and three standard deviation intervals are determined as the ranges that include 68.3%, 95.4% and 99.7% of the PDF. In each q 2 region, the measured values of R K * 0 are found to be in good agreement among the three electron trigger categories (see figure 9). The results are given in table 5 and presented in figure 10, where they are compared both to the SM predictions (see table 1) and to previous measurements from the B factories [4,5].
The combined R K * 0 PDF is used to determine the compatibility with the SM expectations. The p-value, calculated by integrating the PDF above the expected value, is translated into a number of standard deviations. The compatibility with the SM expectations [26][27][28][29][30][31][32][33][34][35] is determined to be 2.1-2.3 and 2.4-2.5 standard deviations, for the low-q 2 and the central-q 2 regions, respectively, depending on the theory prediction used.

Conclusions
This paper reports a test of lepton universality performed by measuring the ratio of the branching fractions of the decays B 0 → K * 0 µ + µ − and B 0 → K * 0 e + e − . The K * 0 meson is reconstructed in the final state K + π − , which is required to have an invariant mass within 100 MeV/c 2 of the known K * (892) 0 mass. Data corresponding to an integrated luminosity Table 5: Measured R K * 0 ratios in the two q 2 regions. The first uncertainties are statistical and the second are systematic. About 50% of the systematic uncertainty is correlated between the two q 2 bins. The 95.4% and 99.7% confidence level (CL) intervals include both the statistical and systematic uncertainties.   [27][28][29], EOS [30,31], flav.io [32][33][34] and JC [35]. The predictions are displaced horizontally for presentation. (right) Comparison of the LHCb R K * 0 measurements with previous experimental results from the B factories [4,5]. In the case of the B factories the specific vetoes for charmonium resonances are not represented.