1 Introduction

Decays of heavy-flavoured hadrons provide a wide range of opportunities to test for possible deviations from Standard Model (SM) expectations. The decays of the \({{D} ^0} \), \({{B} ^0} \) and \({{B} ^0_{s}} \) mesons into charged lepton pairs are particularly interesting since the chiral structure of the SM weak interaction suppresses their branching fractions, which can be predicted with small theoretical uncertainties [1]. These features make them highly sensitive to potential contributions from physics beyond the SM. Intense activity on the \({{B} ^{0}_{({s})}} \!\rightarrow {\mu ^+\mu ^-} \) channels has resulted in the observation of the \({{B} ^0_{s}} \!\rightarrow {\mu ^+\mu ^-} \) decay by the LHCb [2,3,4], CMS [5] and ATLAS [6] experiments, and a combined limit on the \({{B} ^0} \!\rightarrow {\mu ^+\mu ^-} \) branching fraction that approaches its SM value [7]. The experimental limits on the branching fractions of the decays \({{D} ^0} \!\rightarrow {e ^+e ^-} \)  [8], \({{D} ^0} \!\rightarrow {\mu ^+\mu ^-} \)  [9, 10], \({{B} _{({s})}^0} \!\rightarrow {e ^+e ^-} \)  [11] and \({{B} _{({s})}^0} \!\rightarrow {\tau ^+\tau ^-} \)  [12] are still several orders of magnitude above their SM predictions.

In contrast to the situation for pseudoscalar mesons, the leptonic decays of the excited vector \({{D} ^{*0}} \), \({{B} ^{*0}} \) and \({{B} ^{*0}_{s}} \) states have no chiral suppression. In an effective field theory approach, these decays involve the same operators as those of the pseudoscalar mesons [13]. Therefore, if sufficient experimental precision can be obtained, leptonic decays of the vector states provide a complementary approach to constraining the associated Wilson coefficients [14]. The challenge, however, is that the vector mesons can also decay via electromagnetic and (for the \({{D} ^{*0}} \) meson) strong interactions, which have widths many orders of magnitude larger than those for the weak decays. Therefore, the branching fractions of these weak decays are strongly suppressed to levels around \(10^{-11}\) [15, 16], unless there are large enhancements due to physics beyond the SM. For the leptonic \({D} ^{*0}\) decays, further suppression by the GIM mechanism [17] reduces the SM prediction for the branching fractions to the \(\mathcal{O}\left( 10^{-19}\right) \) level [15, 16]. Leptonic \({{D} ^{*0}} \), \({{B} ^{*0}} \) and \({{B} ^{*0}_{s}} \) decays have been considered as a potential probe of physics beyond the SM in Refs. [15, 16], with further investigations of the impact of particular extensions of the SM considered in Refs. [18,19,20,21,22].

An upper limit on the branching fraction of \({{D} ^{*0}} \!\rightarrow {e ^+e ^-} \) decays has been set by the CMD-3 collaboration, corresponding to \({\mathcal {B}} \left( {{D} ^{*0}} \!\rightarrow {e ^+e ^-} \right) < 1.7 \times 10^{-6}\) at 90% confidence level (CL) [23]. Throughout this paper, the \({{D} ^{*0}} \) symbol represents the \({{D} ^*} (2007)^0\) meson, and the inclusion of charge-conjugate processes is implied. There is no previous published search for \({{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} \) decays, but in the absence of chiral suppression the branching fractions for \({{D} ^{*0}} \!\rightarrow {e ^+e ^-} \) and \({{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} \) decays are expected to be the same. As recently discussed in Ref. [24], the copious production of heavy-flavoured hadrons in LHC collisions and the clean experimental signature make it worthwhile to search for \({{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} \) decays in LHCb data. The most promising approach appears to be with the \({{{B} ^-}} \!\rightarrow {{D} ^{*0}} ({\mu ^+\mu ^-})\, {{\pi } ^-} \) decay chain since the displaced vertex and exclusive final state provide powerful background rejection capabilities.

This paper describes the first search for the \({{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} \) decay in the decay chain \({{{B} ^-}} \!\rightarrow {{D} ^{*0}} ({\mu ^+\mu ^-})\,{{\pi } ^-} \). The analysis strategy is to reconstruct \({{{B} ^-}} \!\rightarrow {{\pi } ^-} {\mu ^+\mu ^-} \) candidates and apply selection criteria to reduce background. The dimuon invariant mass, \(m({\mu ^+\mu ^-})\), and the reconstructed \({B} \)-candidate invariant mass, \(m({{\pi } ^-} {\mu ^+\mu ^-})\), are used as discriminating observables in a fit, the results of which were not examined until the full analysis procedure had been finalised. The decay mode \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} ({\mu ^+\mu ^-})\, {{K} ^-} \) is used as a normalisation channel.

The branching fraction for the \({{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} \) decay is obtained through

$$\begin{aligned}{} & {} {\mathcal {B}} \left( {{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} \right) \nonumber \\{} & {} \quad = \frac{N_{{{D} ^{*0}} {{\pi } ^-}}}{N_{{{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{K} ^-}}} \cdot \frac{\varepsilon _{{{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{K} ^-}}}{\varepsilon _{{{D} ^{*0}} {{\pi } ^-}}} \cdot \frac{{\mathcal {B}} \left( {{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{K} ^-} \right) }{{\mathcal {B}} \left( {{{B} ^-}} \!\rightarrow {{D} ^{*0}} {{\pi } ^-} \right) }\nonumber \\{} & {} \qquad \cdot {\mathcal {B}} \left( {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} \!\rightarrow {\mu ^+\mu ^-} \right) , \end{aligned}$$
(1)

where \(N_{{{D} ^{*0}} {{\pi } ^-}}\) and \(\varepsilon _{{{D} ^{*0}} {{\pi } ^-}}\) denote the yield and efficiency for the signal mode, and \(N_{{{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{K} ^-}}\) and \(\varepsilon _{{{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{K} ^-}}\) denote the yield and efficiency for the normalisation mode. All branching fractions on the right-hand side of Eq. (1) are known [25]. The signal- and normalisation-mode yields are determined from invariant-mass fits, and the efficiency ratio is determined from simulation with corrections for data-simulation discrepancies as described below.

2 Detector and simulation

The LHCb detector [26, 27] is a single-arm forward spectrometer covering the pseudorapidity range \(2<\eta <5\), designed for the study of particles containing \(b \) or \(c \) quarks. The detector includes a high-precision tracking system consisting of a silicon-strip vertex detector surrounding the proton–proton (pp) interaction region [28], a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about \(4{\mathrm {\,Tm}}\), and three stations of silicon-strip detectors and straw drift tubes [29, 30] placed downstream of the magnet. The tracking system provides a measurement of the momentum, \(p\), of charged particles with a relative uncertainty that varies from 0.5% at low momentum to 1.0% at 200 \(\text {\,GeV\!/}c\). The minimum distance of a track to a primary pp collision vertex (PV), the impact parameter (IP), is measured with a resolution of \((15+29/p_{\textrm{T}})\,\upmu \text {m} \), where \(p_{\textrm{T}}\) is the component of the momentum transverse to the beam, in \(\text {\,GeV\!/}c\). Different types of charged hadrons are distinguished using information from two ring-imaging Cherenkov detectors [31]. Photons, electrons and hadrons are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers [32]. The online event selection is performed by a trigger [33, 34], which consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a two-level software stage, which reconstructs the full event.

Simulation is used to tune the event selection procedure, to model the shape of the dimuon and \(B \)-candidate invariant-mass distributions and to estimate efficiencies accounting for the effects of the detector acceptance, reconstruction and selection criteria. In the simulation, pp collisions are generated using Pythia  [35] with a specific LHCb configuration [36]. Decays of unstable particles are described by EvtGen  [37], in which final-state radiation is generated using Photos  [38]. The interaction of the generated particles with the detector, and its response, are implemented using the Geant4 toolkit [39] as described in Ref. [40]. The underlying pp interaction is reused multiple times, with an independently generated signal decay for each [41].

The \(B \) candidates reconstructed in simulation are weighted to correct for discrepancies between data and simulation associated with particle-identification efficiency, track-reconstruction efficiency and hardware trigger efficiency. Additional corrections are applied to account for discrepancies in \(B \)-production kinematics and event track multiplicity. After these weights are applied, the simulated distributions of all variables used in the analysis are in good agreement with the data.

The particle identification efficiencies are determined from data using unbiased samples of identified charged particles from \({{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} \!\rightarrow {\mu ^+\mu ^-} \) and \({{D} ^{*+}} \!\rightarrow {{D} ^0} ({{K} ^-} {{\pi } ^+})\,{{\pi } ^+} \) decays [42]. The efficiencies are determined in intervals of track momentum, track pseudorapidity and event track multiplicity to match the properties of the signal and normalisation modes. Differences between data and simulation associated with the track reconstruction efficiency are corrected using samples of \({{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} \!\rightarrow {\mu ^+\mu ^-} \) decays [43]. The hardware trigger efficiency is determined, using independently selected \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} ({\mu ^+\mu ^-})\,{{K} ^-} \) decays, in intervals of the transverse momenta of the \({\mu ^+} \) and the \({\mu ^-} \) tracks [44]. Discrepancies associated with the \(B \)-production kinematics and the event track multiplicity are corrected using a multivariate algorithm [45], which is trained using \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} ({\mu ^+\mu ^-})\,{{K} ^-} \) decays in background-subtracted data and simulation. As a cross-check, an alternative algorithm is trained using the number of tracks involved in the PV reconstruction as additional input, and consistent results are obtained.

3 Candidate selection

Events are required to pass a hardware trigger that selects events containing at least one muon with high transverse momentum. The \(p_{\textrm{T}} \) threshold is between 1.2 and 2.2 \(\text {\,GeV\!/}c\) depending on the data-taking period. In the subsequent software trigger, at least one muon is required to have \(p_{\textrm{T}} > 0.8\,\text {\,GeV\!/}c \) and impact parameter greater than \(100\,\upmu \text {m} \) with respect to all PVs in the event. The tracks of two or more of the final-state particles are required to form a vertex that is significantly displaced from any PV. A multivariate algorithm [46, 47] is used to identify secondary vertices consistent with the decay of a \(b \) hadron.

The \(B \)  candidates are formed from pairs of well- reconstructed oppositely charged tracks that are identified as muons, combined with an additional track that is identified as either a charged pion or a charged kaon for \({{{B} ^-}} \!\rightarrow {{D} ^{*0}} {{\pi } ^-} \) or \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{K} ^-} \) candidates, respectively. Each track is required to have a good fit quality, a low probability of overlapping with any other track in the event, \(p_{\textrm{T}} > 300\text {\,MeV\!/}c \) and to be inconsistent with originating directly from any PV. To ensure reliable particle identification, each track is further required to have \(5< p < 100\,\text {\,GeV\!/}c \) and \( 1.9< \eta < 4.9\), and events are required to contain fewer than 300 tracks. The \(B \)  candidates are required to have a good-quality vertex fit. The \(B \)  candidates are each associated with the PV giving the smallest value of \(\chi ^2_\textrm{IP}\), which is defined as the difference in the vertex-fit \(\chi ^2\) of the PV reconstructed with and without the candidate. Each \(B \)  candidate must be consistent with originating from its associated PV, with a momentum vector aligned with the direction between the primary and \(B \)-decay vertices.

Each \(B \) candidate is required to have an invariant mass in the range \(5180< m(h^-{\mu ^+\mu ^-}) < 6000 \,\text {\,MeV\!/}c^2 \), where \(h^-\) represents the pion or kaon in the signal or normalisation mode, respectively. The dimuon invariant mass is calculated from the outcome of a kinematic fit in which the \(B \)-candidate invariant mass is constrained to the known \({{B} ^-} \)  mass [25] and the momentum vector is constrained to be consistent with the line of flight between the PV and the decay vertex, thereby improving the resolution. The dimuon invariant mass is required to be in the range \(1900< m({\mu ^+\mu ^-}) < 2100 \,\text {\,MeV\!/}c^2 \) for the signal mode and \(3000< m({\mu ^+\mu ^-}) < 3200 \,\text {\,MeV\!/}c^2 \) for the normalisation mode. The expected signal resolution corresponds to about \(19\,\text {\,MeV\!/}c^2 \) in the \(B \)-candidate invariant mass and about \(7\,\text {\,MeV\!/}c^2 \) in the dimuon invariant mass.

Combinatorial background arising from random combinations of tracks is suppressed using a multivariate classifier. A boosted decision tree (BDT) algorithm [48, 49], as implemented in the TMVA toolkit [50], is trained using supervised learning with ten-fold cross validation [51] to achieve an unbiased classifier response. The BDT classifier is trained to identify the \({{{B} ^-}} \!\rightarrow {{\pi } ^-} {\mu ^+\mu ^-} \) signal candidates irrespective of dimuon invariant mass. The signal sample used for the BDT training comprises simulated nonresonant \({{{B} ^-}} \!\rightarrow {{\pi } ^-} {\mu ^+\mu ^-} \) decays. The background training sample comprises data from the upper sideband in the regions \(5500< m({{\pi } ^-} {\mu ^+\mu ^-}) < 7000\,\text {\,MeV\!/}c^2 \) with vetoes to remove candidates containing \({{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} \!\rightarrow {\mu ^+\mu ^-} \), \({\psi {(2S)}} \!\rightarrow {\mu ^+\mu ^-} \) and \({{B} _{c} ^-} \!\rightarrow {{\pi } ^-} {\mu ^+\mu ^-} \) decays. As no particle identification information is used in the classifier, it can be applied to both the \({{{B} ^-}} \!\rightarrow {{D} ^{*0}} {{\pi } ^-} \) and the \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{K} ^-} \) samples. The features of the data used to classify \(B \)  candidates are: the \(p_{\textrm{T}}\) and IP of the pion and muon tracks; the flight distance, \(p_{\textrm{T}}\) and vertex quality of the \(B \)  candidate; a measure of the isolation of the \(B \)  candidate [52]; the number of tracks used to reconstruct the PV that is most consistent with being the origin of the \(B \)  candidate; and the absolute difference between the momenta of the muons.

The requirements on the BDT output and the particle identification of the charged pion are optimised simultaneously to obtain the best signal sensitivity. The optimisation is based on pseudoexperiments. Ensembles of pseudoexperiments are generated in a grid of points corresponding to different selection requirements. The generation includes only the expected background components: nonresonant \({{{B} ^-}} \!\rightarrow {{\pi } ^-} {\mu ^+\mu ^-} \) decays; \({{{B} ^-}} \!\rightarrow {{K} ^-} {\mu ^+\mu ^-} \) decays, where the \({K} ^-\) is mistakenly identified as a \({\pi } ^-\); and combinatorial background. No \({{{B} ^-}} \!\rightarrow {{D} ^{*0}} ({\mu ^+\mu ^-})\,{{\pi } ^-} \) signal is included as this component is not expected to be significant. The distributions of the nonresonant backgrounds are modelled using simulation, whilst those of the combinatorial background are modelled using sideband data. The sideband excludes \(B \)  candidates inside a signal region of \(\pm 3\) times the expected dimuon invariant-mass signal resolution, centred at the known \({{D} ^{*0}} \) mass [25]. The expected yields for all background components are determined at each grid point by fitting a background-only model to the sideband and interpolating the results. Each pseudoexperiment is fitted in the same way as data (see Sect. 4).

The minimised figure of merit is defined as the expected uncertainty on the signal yield divided by the signal efficiency, as this is proportional to the expected upper limit in the absence of signal. The expected uncertainty on the signal yield is the width of the distribution of residuals in the ensemble. Alternative figures of merit are found to give similar working points. With the optimal requirements, the classifier has a combinatorial background rejection power of \(99.3\%\), whilst retaining \(84.1\%\) of \({{{B} ^-}} \!\rightarrow {{D} ^{*0}} ({\mu ^+\mu ^-})\,{{\pi } ^-} \) decays. The particle identification requirements have a signal efficiency of about \(86\%\), with a charged kaon misidentification rate of about \(3\%\). Each selected event contains only a single \(B \)  candidate. As the classifier separates \(B \)  decay signal from combinatorial background, high-purity samples of \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} ({\mu ^+\mu ^-})\,{{K} ^-} \) decays are obtained using the same classifier requirements, when requiring a positively identified kaon.

Backgrounds from partially reconstructed decays such as \({{\hspace{1.79993pt}\overline{\hspace{-1.79993pt}B}} {}^0} \!\rightarrow {{\rho } ^0} ({{\pi } ^+} {{\pi } ^-})\,{\mu ^+\mu ^-} \) and \({{\hspace{1.79993pt}\overline{\hspace{-1.79993pt}B}} {}^0_{s}} \!\rightarrow f_0({{\pi } ^+} {{\pi } ^-})\,{\mu ^+\mu ^-} \) for the signal mode and \({{\hspace{1.79993pt}\overline{\hspace{-1.79993pt}B}} {}^0} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} ({\mu ^+\mu ^-})\,{{\hspace{1.79993pt}\overline{\hspace{-1.79993pt}K}} {}^{*0}} ({{K} ^-} {{\pi } ^+})\) for the normalisation mode have a reconstructed \(B \)-candidate invariant mass that lies more than \(100\,\text {\,MeV\!/}c^2 \) below the known \({{B} ^-} \)  mass [25]. Consequently, these background components lie outside of the fit range used in the analysis. No significant contributions from other partially reconstructed backgrounds are expected within the dimuon invariant-mass fit ranges for the signal and normalisation modes. This is true for decays with a missing photon, such as \({{{B} ^-}} \!\rightarrow \eta ^{(\prime )}({\mu ^+\mu ^-} \gamma )\,{{\pi } ^-} \), or with one or more missing neutrinos, such as \({{{B} ^-}} \!\rightarrow {{D} ^0} (h^{-}{\mu ^+} {{\nu } _\mu })\,{\mu ^-} {{\overline{\nu }} _\mu } \). Contributions from hadronic backgrounds such as \({{{B} ^-}} \!\rightarrow {{\pi } ^-} {{\pi } ^+} {{\pi } ^-} \) decays, where two pions are mistakenly identified as muons are also estimated using simulation [53] and are found to be negligible.

Unavoidably, a considerable number of nonresonant \({{{B} ^-}} \!\rightarrow {{\pi } ^-} {\mu ^+\mu ^-} \) decays [54] survive the signal selection. This background has a \(B \)-candidate invariant-mass signature identical to the signal, but has a smoothly varying dimuon invariant-mass distribution over the fit range [55]. Possible interference effects between the signal and nonresonant \({{{B} ^-}} \!\rightarrow {{\pi } ^-} {\mu ^+\mu ^-} \) decays are neglected due to the narrow \({{D} ^{*0}} \) meson width. Due to the imperfect pion-kaon separation, misidentified \({{{B} ^-}} \!\rightarrow {{K} ^-} {\mu ^+\mu ^-} \) decays also form an important background. This background favours lower \(B \)-candidate invariant-mass values due to the kaon-pion mass difference, and has a smoothly varying dimuon invariant-mass distribution over the fit range.

For the normalisation mode, the background from misidentified \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} ({\mu ^+\mu ^-})\,{{\pi } ^-} \) decays is small due to its comparatively low branching fraction and the additional suppression from the kaon identification requirement. Nonetheless, due to the large sample size, it is considered in the fit for the normalisation channel.

4 Invariant-mass fits

The \({{{B} ^-}} \!\rightarrow {{D} ^{*0}} ({\mu ^+\mu ^-})\,{{\pi } ^-} \) yield is determined from a two-dimensional extended unbinned maximum-likelihood fit to the \(m({\mu ^+\mu ^-})\) and \(m({{\pi } ^-} {\mu ^+\mu ^-})\) distributions of the selected \(B \)  candidates. The fit model includes four components: signal \({{{B} ^-}} \!\rightarrow {{D} ^{*0}} {{\pi } ^-} \) decays, nonresonant \({{{B} ^-}} \!\rightarrow {{\pi } ^-} {\mu ^+\mu ^-} \) decays, misidentified nonresonant \({{{B} ^-}} \!\rightarrow {{K} ^-} {\mu ^+\mu ^-} \) decays and combinatorial background. The dimuon and the \(B \)-candidate invariant-mass distributions for each component are modelled using empirical analytical functions. The models for the signal and the two nonresonant backgrounds are validated using simulation. No significant dependencies between the two fit variables are observed in simulation or sideband data, so they are treated as uncorrelated.

The signal dimuon invariant-mass distribution is modelled using a Gaussian function with power-law tails on both sides of the distribution [56]. The signal \(B \)-candidate invariant-mass distribution is modelled using the sum of a Gaussian function and a Gaussian function with power-law tails. The tail parameters are fixed to the values obtained from simulation. The signal dimuon and \(B \)-candidate invariant-mass models each include a global shift of peak position and a global scaling factor for the width of the distribution, relative to the values found in simulation.

For the two nonresonant backgrounds, the dimuon invariant-mass distributions are described by first-order polynomial functions, with slope parameters fixed to values obtained from simulation. The \(B \)-candidate invariant-mass distribution for the nonresonant \({{{B} ^-}} \!\rightarrow {{\pi } ^-} {\mu ^+\mu ^-} \) background is modelled with the same function used for the signal, whilst the \({{{B} ^-}} \!\rightarrow {{K} ^-} {\mu ^+\mu ^-} \) background is modelled with a Gaussian function with power-law tails. The tail parameters are fixed to the values obtained from simulation. The \(B \)-candidate invariant-mass models for the two nonresonant background components share the global peak position shift and width scaling factor with the signal \(B \)-candidate invariant-mass model.

For the combinatorial background, the dimuon and the \(B \)-candidate invariant-mass distributions are modelled using a first-order polynomial function and an exponential function, respectively. The dimuon and the \(B \)-candidate invariant-mass slopes are allowed to vary in the fit to data.

In total, the fit includes six free parameters: the yields for each component and the two parameters of the combinatorial background model. The global peak position shift and width scaling factor for each of the dimuon and \(B \)-candidate invariant-mass models are constrained in the fit to be consistent with values obtained from fits to the \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} ({\mu ^+\mu ^-})\,{{K} ^-} \) candidates described below. Figure 1 shows the dimuon and \(B \)-candidate invariant-mass distributions of selected \({{{B} ^-}} \!\rightarrow {{D} ^{*0}} {{\pi } ^-} \) candidates, with results of the fit superimposed. Figure 2 shows the two-dimensional distribution of selected candidates. The fit favours a slightly negative \({{{B} ^-}} \!\rightarrow {{D} ^{*0}} {{\pi } ^-} \) yield, which is attributed to a downward fluctuation of the background in the region close to the \({{B} ^-} \) and \({D} ^{*0}\) meson masses. Table 1 summarises the yields obtained from the fit.

Fig. 1
figure 1

Reconstructed (left) \({\mu ^+\mu ^-} \) and (right) \({{\pi } ^-} {\mu ^+\mu ^-} \) invariant-mass distributions for the selected \({{{B} ^-}} \!\rightarrow {{D} ^{*0}} ({\mu ^+\mu ^-})\,{{\pi } ^-} \) candidates, with results of the fit superimposed

The \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} ({\mu ^+\mu ^-})\,{{K} ^-} \) yield is determined from a one-dimensional extended unbinned maximum-likelihood fit to the \(m({{K} ^-} {\mu ^+\mu ^-})\) distribution. The normalisation mode also serves to obtain the constraints on the correction factors that account for discrepancies between data and simulation in the signal peak position and width. Therefore, an additional maximum-likelihood fit is performed to the \(m({\mu ^+\mu ^-})\) distribution to determine these factors. The B-candidate invariant-mass and dimuon invariant-mass fits to the normalisation mode are independent of each other. A two-dimensional fit is not used since possible correlations in the tail regions of the two observables could cause fit bias due to the large sample size.

The fits to the \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} ({\mu ^+\mu ^-})\,{{K} ^-} \) candidates include three components: \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{K} ^-} \) decays, misidentified \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{\pi } ^-} \) decays, and combinatorial background. The dimuon and the B-candidate invariant-mass distributions for each component are modelled using empirical functions. For the \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{K} ^-} \) component, the dimuon and \(B \)-candidate invariant-mass distributions are modelled with the same functions used for the signal \({{{B} ^-}} \!\rightarrow {{D} ^{*0}} {{\pi } ^-} \) component in the signal-mode fit. For the misidentified \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{\pi } ^-} \) component, both the \(B \)-candidate and the dimuon invariant-mass distributions are modelled using a Gaussian function with power-law tails on both sides. The parameters for the \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{K} ^-} \) and the misidentified \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{\pi } ^-} \) components are determined from simulation, but a global peak position shift and a width scaling factor are allowed to vary in the fits to data.

Fig. 2
figure 2

Two-dimensional distribution of \({\mu ^+\mu ^-} \) invariant mass versus \({{\pi } ^-} {\mu ^+\mu ^-} \) invariant mass for the selected \({{{B} ^-}} \!\rightarrow {{D} ^{*0}} ({\mu ^+\mu ^-})\,{{\pi } ^-} \) candidates. The red box corresponds to a range of about \(\pm 3\sigma \) around the expected signal peak position in each dimension

Combinatorial background is modelled using an exponential function in the \(B \)-candidate invariant-mass distribution and a first-order polynomial function in the dimuon invariant-mass distribution. The \(B \)-candidate and the dimuon invariant-mass slopes are allowed to vary in the fits.

Table 1 Yields obtained from the fit to data described in the text, with statistical uncertainties only

In total, each one-dimensional fit model includes six free parameters: the yields for each component, the global peak shift and width scaling factor and the model parameter of the combinatorial background. Figure 3 shows the dimuon and \(B \)-candidate invariant-mass distributions of selected \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{K} ^-} \) candidates. The dimuon and the B-candidate invariant-mass fits converge to \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{K} ^-} \) yield values that are consistent within \(0.4\%\); the B-candidate invariant-mass fit converges to \((2316 \pm 2)\times 10^{3}\) decays, where the uncertainty is statistical only. The contributions from misidentified \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{\pi } ^-} \) decays and combinatorial background are found to be negligible.

Fig. 3
figure 3

Reconstructed (left) \({\mu ^+\mu ^-} \) and (right) \({{K} ^-} {\mu ^+\mu ^-} \) invariant-mass distributions for the selected \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} ({\mu ^+\mu ^-})\,{{K} ^-} \) candidates, with results of the fits superimposed. The contributions from misidentified \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} ({\mu ^+\mu ^-}){{\pi } ^-} \) decays and combinatorial background are negligible and thus barely visible

5 Systematic uncertainties and results

To obtain the branching fraction of the \({{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} \) decay, the signal yield in the fit described in Sect. 4 is parameterised in terms of \({\mathcal {B}} \left( {{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} \right) \) using Eq. (1). The values of the branching fractions on the right-hand side of the equation, the efficiency ratio and the normalisation yield are allowed to vary within Gaussian constraints to account for the uncertainties on these inputs. Table 2 summarises the constraints. The widths of the constraints correspond to the statistical and systematic uncertainties added in quadrature. For the measured branching fractions, the values from Ref. [25] are used. The efficiency ratio \({\varepsilon _{{{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} {{K} ^-}}}/{\varepsilon _{{{D} ^{*0}} {{\pi } ^-}}}\) is obtained from simulation, accounting for the geometrical acceptance of the detector as well as effects related to the triggering, reconstruction and selection of the \(B \)  candidates. The normalisation yield is obtained from the fits described in the previous section.

Table 2 Input parameters used in the estimation of the \({{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} \) branching fraction. The uncertainties correspond to the statistical and systematic uncertainties added in quadrature

The uncertainty on the efficiency ratio takes into account the simulation sample size, uncertainties on the weights applied to the simulation and the matching between reconstructed and generated particles in the simulation. The systematic uncertainties associated with the weights are evaluated by varying all weights within their uncertainties and by varying the binning used to estimate them. The systematic uncertainty associated with the multivariate weighting algorithm is evaluated by comparing the results obtained with the default and alternative algorithms (see Sect. 2). The systematic uncertainty associated with the matching between reconstructed and generated particles in the simulation is evaluated by comparing the efficiencies obtained including or excluding \(B \)  candidates for which one or more decay products are not correctly matched. All variations are made consistently for the signal and normalisation modes to avoid an overestimation of the uncertainty on the efficiency ratio. The systematic effects associated with the uncertainties on the weights cancel out almost fully in the determination of the efficiency ratio. The effect of the matching between reconstructed and generated particles dominates the uncertainty on the efficiency ratio, but has no significant impact on the modelled invariant-mass distributions.

The systematic uncertainty on the yield of the normalisation mode is evaluated by comparing the yields obtained in four different approaches: the baseline fit to the \(B \)-candidate invariant-mass distribution; a fit replacing the \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} \, {{K} ^-} \) shape by a Hypatia function [57]; the baseline fit to the dimuon invariant-mass distribution; and a fit to the dimuon invariant-mass distribution replacing the \({{{B} ^-}} \!\rightarrow {{J \hspace{-1.66656pt}/\hspace{-1.111pt}\psi }} \, {{K} ^-} \) shape by the sum of a Gaussian function and a Gaussian function with power-law tails on both sides of the distribution. The largest difference is assigned as the systematic uncertainty. The difference between the fit models is also used to assign a systematic uncertainty on the global peak position shifts and width scaling factors for the signal mode.

Including all constraints, the fit to data yields

$$\begin{aligned} {\mathcal {B}} \left( {{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} \right) = (-1.06\pm 1.85)\times 10^{-8}\,, \end{aligned}$$

where statistical and systematic uncertainties are combined. An upper limit on the branching fraction is obtained following the Feldmann–Cousins prescription [58]: pseudoexperiments are generated for various values of the branching fraction and the resulting distribution of measured branching fractions is used to form confidence belts. Figure 4 shows confidence belts at \(90\%\) and \(95\%\) CL. The result obtained from the fit yields

$$\begin{aligned} {\mathcal {B}} ({{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} ) < 2.6\, (3.4) \times 10^{-8} \text { at 90 (95)\% CL}\,. \end{aligned}$$

The procedure is repeated fixing the nuisance parameters to their central values to assess the impact of the systematic uncertainties. In this case, with statistical uncertainty only, the fitted branching fraction is \((-1.10\pm 1.72)\times 10^{-8}\) and the obtained 90 (95)% CL upper limit is \(2.3\, (3.2) \times 10^{-8}\), indicating that the result is statistically limited. As further checks the procedure is repeated restricting the signal yield to positive values, or using alternative signal dimuon and \(B \)-candidate invariant-mass shapes. No significant change in the upper limit is found in any of these checks.

Fig. 4
figure 4

Confidence belts generated using pseudoexperiments according to the Feldman–Cousins prescription [58]. The vertical black line shows the result of the fit to data

6 Summary

A search for the \({{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} \) decay is performed by analysing \({{{B} ^-}} \!\rightarrow {{\pi } ^-} {\mu ^+\mu ^-} \) decays. The analysis uses a data sample corresponding to an integrated luminosity of \(9\text {\,fb} ^{-1} \) collected with the LHCb experiment in pp collisions between 2011 and 2018. This is the first search for a rare charm-meson decay exploiting its production in beauty-meson decays. No excess with respect to the background-only hypothesis is observed and an upper limit of \({\mathcal {B}} ({{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} ) < 2.6 \times 10^{-8}\) at \(90\%\) CL is set. This measurement is the first limit on the branching fraction of \({{D} ^{*0}} \!\rightarrow {\mu ^+\mu ^-} \) decays and the most stringent limit on \({D} ^{*0}\) decays to leptonic final states.