First evidence for the annihilation decay mode $B^{+} \to D_{s}^{+} \phi$

Evidence for the hadronic annihilation decay mode $B^{+} \to D_s^{+}\phi$ is found with greater than $3\sigma$ significance. The branching fraction and \CP asymmetry are measured to be \mathcal{B}(B^{+} \to D_s^{+}\phi)&=&(1.87^{+1.25}_{-0.73}({\rm stat}) \pm 0.19 ({\rm syst}) \pm 0.32 ({\rm norm})) \times 10^{-6}, \mathcal{A}_{CP}(B^{+} \to D_s^{+}\phi)&=&-0.01 \pm 0.41 ({\rm stat}) \pm 0.03 ({\rm syst}). The last uncertainty on $\mathcal{B}(B^{+} \to D_s^{+}\phi)$ is from the branching fractions of the $B^+ \to D_s^+ \kern 0.2em\bar{\kern -0.2em D}{}^0$ normalization mode and intermediate resonance decays. Upper limits are also set for the branching fractions of the related decay modes $B^{+}_{(c)} \to D^{+}_{(s)} K^{*0}$, $B^{+}_{(c)} \to D^{+}_{(s)} \kern 0.2em\bar{\kern -0.2em K}{}^{*0}$ and ${B_c^{+} \to D^{+}_{s}\phi}$, including the result ${\mathcal{B}(B^+ \to D^+ K^{*0})}<1.8 \times 10^{-6}$ at the 90% credibility level.


Introduction
The decays 1 B + → D + s φ, D + K * 0 , D + s K * 0 occur in the Standard Model (SM) via annihilation of the quarks forming the B + meson into a virtual W + boson (Fig. 1). There is currently strong interest in annihilation-type decays of B + mesons due, in part, to the roughly 2σ deviation above the SM prediction observed in the branching fraction of B + → τ + ν [1, 2]. Annihilation diagrams of B + mesons are highly suppressed in the SM; no hadronic annihilation-type decays of the B + meson have been observed to-date. Branching fraction predictions (neglecting rescattering) for B + → D + s φ and B + → D + K * 0 are (1 − 7) × 10 −7 in the SM [3][4][5][6], where the precision of the calculations is limited by hadronic uncertainties. The branching fraction for the B + → D + s K * 0 decay mode is expected to be about 20 times smaller due to the CKM quark-mixing matrix elements involved. The current upper limits on the branching fractions of these decay modes are B(B + → D + s φ) < 1.9 × 10 −6 [7], B(B + → D + K * 0 ) < 3.0 × 10 −6 [8] and B(B + → D + s K * 0 ) < 4.0 × 10 −4 [9], all at the 90% confidence level. Contributions from physics beyond the SM (BSM) could greatly enhance these branching fractions and/or produce a large CP asymmetry [4,5]. For example, a charged Higgs (H + ) boson mediates the annihilation process. Interference between the W + and H + amplitudes could result in a CP asymmetry if the two amplitudes are of comparable size and have both strong and weak phase differences different from zero. An H + contribution to the amplitude could also significantly increase the branching fraction.
In this paper, first evidence for the decay mode B + → D + s φ is presented using 1.0 fb −1 of data collected by LHCb in 2011 from pp collisions at a center-of-mass energy of 7 TeV. The branching fraction and CP asymmetry are measured. Limits are set on the branching fraction of the decay modes B + → D + K * 0 and B + → D + s K * 0 , along with the highly suppressed decay modes B + → D + K * 0 and B + → D + s K * 0 . Limits are also set on the product of the production rate and branching fraction for B + c decays to the final states D + s φ, D + (s) K * 0 and D + (s) K * 0 .

The LHCb experiment
The LHCb detector [10] is a single-arm forward spectrometer covering the pseudorapidity range 2 < η < 5, designed for the study of particles containing b or c quarks. The detector includes a high precision tracking system consisting of a silicon-strip vertex detector surrounding the pp interaction region, a large-area silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip detectors and straw drift tubes placed downstream. The combined tracking system has a momentum resolution ∆p/p that varies from 0.4% at 5 GeV/c to 0.6% at 100 GeV/c, and an impact parameter resolution of 20 µm for tracks with high transverse momentum (p T ). Discrimination between different types of charged particles is provided by two ring-imaging Cherenkov detectors [11]. Photon, electron and hadron candidates are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter. Muons are identified by a muon system composed of alternating layers of iron and multiwire proportional chambers. The LHCb trigger [12] consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage which applies a partial event reconstruction (only tracks with p T > 0.5 GeV/c are used). The software stage of the LHCb trigger builds two-, three-and four-track partial b-hadron candidates that are required to be significantly displaced from the primary interaction and have a large sum of p T in their tracks. At least one of the tracks used to form the trigger candidate must have p T > 1.7 GeV/c and impact parameter χ 2 with respect to the primary interaction χ 2 IP > 16. The χ 2 IP is defined as the difference between the χ 2 of the primary interaction vertex reconstructed with and without the considered track. A boosted decision tree (BDT) [13,14] is used to distinguish between trigger candidates originating from b-hadron decays and those that originate from prompt c-hadrons or combinatorial background. The BDT provides a pure sample of bb events for offline analysis.
For the simulation, pp collisions are generated using Pythia 6.4 [15] with a specific LHCb configuration [16]. Decays of hadronic particles are described by EvtGen [17] in which final state radiation is generated using Photos [18]. The interaction of the generated particles with the detector and its response are implemented using the Geant4 toolkit [19] as described in Ref. [20].

Event selection
Candidates of the decays searched for are formed from tracks that are required to have p T > 0.1 GeV/c, χ 2 IP > 4 and p > 1 GeV/c. For the φ and K * 0 decay products the momentum requirement is increased to p > 2 GeV/c. These momentum requirements are 100% efficient on simulated signal events. The D + s → K + K − π + , D + → K − π + π + , φ → K + K − and K * 0 → K + π − candidates are required to have invariant masses within 25, 25, 20 and 50 MeV/c 2 of their respective world-average (PDG) values [21]. The mass resolutions for D + s → K + K − π + and D + → K − π + π + are about 7 MeV/c 2 and 8 MeV/c 2 , respectively. The decay chain is fit constraining the D + (s) candidate mass to its PDG value. The D + (s) vertex is required to be downstream of the B + vertex and the p-value formed from χ 2 IP + χ 2 vertex of the B + candidate is required to be greater than 0.1%. Backgrounds from charmless decays are suppressed by requiring significant separation between the D + (s) and B + decay vertices. This requirement reduces contributions from charmless backgrounds by a factor of about 15 while retaining 87% of the signal.
Cross-feed between D + and D + s candidates can occur if one of the child tracks is misidentified. If a D + s → K + K − π + candidate can also form a D + → K − π + π + candidate that falls within 25 MeV/c 2 of the PDG D + mass, then it is rejected unless either |m KK − m PDG φ | < 10 MeV/c 2 or the ambiguous child track satisfies a stringent kaon particle identification (PID) requirement. This reduces the D + → D + s cross-feed by a factor of about 200 at the expense of only 4% of the signal. For decay modes that contain a D + meson, a D + → K − π + π + candidate that can also form a D + s → K − K + π + candidate whose mass is within 25 MeV/c 2 of the PDG D + s mass is rejected if either |m KK − m PDG φ | < 10 MeV/c 2 or the ambiguous child track fails a stringent pion PID requirement. For all modes, Λ + c → D + (s) cross-feed (from the Λ + c → pK − π + decay mode) is suppressed using similar requirements.
When a pseudoscalar particle decays into a pseudoscalar and a vector, V , the spin of the vector particle (in this case a φ or K * 0 ) must be orthogonal to its momentum to conserve angular momentum; i.e., the vector particle must be longitudinally polarized. For a longitudinally-polarized φ (K * 0 ) decaying into the K + K − (K + π − ) final state, the angular distribution of the K + meson in the V rest frame is proportional to cos 2 θ K , where θ K is the angle between the momenta of the K + and B + in the V rest frame. The requirement | cos θ K | > 0.4, which is 93% efficient on signal and rejects about 40% of the background, is applied in this analysis.
Four BDTs that identify D + s → K + K − π + , D + → K − π + π + , φ → K + K − and K * 0 → K + π − candidates originating from b-hadron decays are used to suppress the backgrounds. The BDTs are trained using large clean D + (s) , φ and K * 0 samples obtained from B 0 (s) → D + (s) π − , B 0 s → J/ψ φ and B 0 → J/ψ K * 0 data, respectively, where the backgrounds are subtracted using the sPlot technique [22]. Background samples for the training are taken from the D + (s) , φ and K * 0 sidebands in the same data samples. The BDTs take advantage of the kinematic similarity of all b-hadron decays and avoid using any topology-dependent information. The BDTs use kinematic, track quality, vertex and PID information to obtain a high level of background suppression. In total, 23 properties per child track and five properties from the parent D + (s) , φ or K * 0 meson are used in each BDT. The boosting method used is known as bagging [23], which produces BDT response values in the unit interval.
A requirement is made on the product of the BDT responses of the D + (s) and φ or K * 0 candidates. Tests on several B 0 (s) → DD decay modes show that this provides the best performance [24]. The efficiencies of these cuts are obtained using large B 0 (s) → D + (s) π − , B 0 s → J/ψ φ and B 0 → J/ψ K * 0 data samples that are not used in the BDT training. The efficiency calculation takes into account the kinematic differences between the signal and training decay modes using additional input from simulated data. Correlations between the properties of the D + (s) and φ or K * 0 mesons in a given B + candidate are also accounted for.
The optimal BDT requirements are chosen such that the signal significance is maximized for the central value of the available SM branching fraction predictions. The signal efficiency of the optimal BDT requirement is 51%, 69% and 51% for B + → D + s φ, B + → D + K * 0 and B + → D + s K * 0 decay modes, respectively. The final sample contains no events with multiple candidates. Finally, no consideration is given to contributions where the K + K − (K + π − ) is in an S-wave state or from the tails of higher φ(K * 0 ) resonances. Such contributions are neglected as they are expected to be much smaller than the statistical uncertainties.

Branching fraction for the
The B + → D + s φ yield is determined by performing an unbinned maximum likelihood fit to the invariant mass spectra of B + candidates. Candidates failing the cos θ K and/or m KK selection criteria that are within 40 MeV/c 2 of m PDG φ are used in the fit to help constrain the background probability density function (PDF). The data set is comprised of the four subsamples given in Table 1. They are fit simultaneously to a PDF with the following components: • B + → D + s φ: A Gaussian function whose parameters are taken from simulated data and fixed in the fit is used for the signal shape. The fraction of signal events in each of the subsamples is also fixed from simulation to be as follows: (A) 89%; (B) 4%; (C) 7% and (D) no signal expected. Thus, almost all signal events are expected to be found in region A, while region D should contain only background. A 5% systematic uncertainty is assigned to the branching fraction determination due to the shape of the signal PDF. This value is obtained by considering the effect on the branching fraction for many variations of the signal PDFs for B + → D + s φ and the normalization decay mode.
• B + → D * + s φ: The φ in this decay mode does not need to be longitudinally polarized. When the photon from the D * + s decay is not reconstructed, the polarization affects both the invariant mass distribution and the fraction of events in each of the subsamples. Studies using a wide range of polarization fractions, with shapes taken from simulation, show that the uncertainties in this PDF have a negligible impact on the signal yield.
• B 0 s → D ( * )+ s K − K * 0 : These decay modes, which arise as backgrounds to B + → D + s φ when the pion from the K * 0 decay is not reconstructed, have not yet been observed; however, they are expected to have similar branching fractions to the decay modes [25]. The fraction of events in each subsample is constrained by simulation. Removing these constraints results in a 1% change in the signal yield.
• Combinatorial background: An exponential shape is used for this component. The exponent is fixed to be the same in all four subsamples. This component is assumed to be uniformly distributed in cos θ K . Removing these constraints produces shifts in the signal yield of up to 5%; thus, a 5% systematic uncertainty is assigned to the branching fraction measurement.
To summarize, the parameters allowed to vary in the fit are the signal yield, the yield and longitudinal polarization fraction of B + → D * + s φ, the yield of B 0 s → D ( * )+ s K − K * 0 in each subsample, the combinatorial background yield in each subsample and the combinatorial exponent. Figure 2 shows the B + candidate invariant mass spectra for each of the four subsamples, along with the various components of the PDF. The signal yield is found to be 6.7 +4.5 −2.6 , where the confidence interval includes all values of the signal yield for which log (L max /L) < 0.5. The statistical significance of the signal is found using Wilks Theorem [26] to be 3.6σ. A simulation study consisting of an ensemble of 10 5 data sets confirms the significance and also the accuracy of the coverage to within a few percent. All of the variations in the PDFs discussed above result in significances above 3σ; thus, evidence for B + → D + s φ is found at greater than 3σ significance including systematics.
The B + → D + s φ branching fraction is normalized to B(B + → D + s D 0 ). The selection for the normalization mode, which is similar to that used here for B + → D + s φ, is described in detail in Ref. [24]. The ratio of the efficiency of the product of the geometric, trigger, reconstruction and selection (excluding the charmless background suppression and BDT) requirements of the signal mode to the normalization mode is found from simulation to be 0.93 ± 0.05. The ratio of BDT efficiencies, which include all usage of PID information, is determined from data (see Sect. 3) to be 0.52 ± 0.02. The large branching fraction of the normalization mode permits using a BDT requirement that is nearly 100% efficient. For the charmless background suppression requirement, the efficiency ratio is determined from simulation to be 1.15 ± 0.01. The difference is mostly due to the fact that the normalization mode has two charmed mesons, while the signal mode only has one. The  Table 1, are labelled on the panels. The PDF components are as given in the legend.
The SM predicts the branching fraction ratios B( [3]. The partially reconstructed backgrounds are expected to be much larger in these channels compared to B + → D + s φ mainly due to the large K * 0 mass window. Producing an exhaustive list of decay modes that contribute to each of these backgrounds is not feasible; thus, reliable PDFs for the backgrounds are not available. Instead, data in the sidebands around the signal region are used to estimate the expected background yield in the signal region. The signal region is chosen to be ±2σ around the B + mass, where σ = 13.8 MeV/c 2 is determined from simulation. Our prior knowledge about the background can be stated as the following three assumptions: (1) the slope is negative, which will be true provided b-baryon background contributions are not too large; (2) it does not peak or form a shoulder 2 and (3) the background yield is non-negative. These background properties are assumed to hold throughout the signal and sideband regions. To convert these assumptions into background expectations, ensembles of background-only data sets are generated using the observed data in the sidebands and assuming Poisson distributed yields. For each simulated data set, all interpolations into the signal region that satisfy our prior assumptions are assigned equal probability. These probabilities are summed over all data sets to produce background yield PDFs, all of which are well described by Gaussian lineshapes (truncated at zero) with the parameters µ bkgd and σ bkgd given in Table 3. The B + candidate invariant mass distributions, along with the background expectations, are shown in Fig. 3. The results of spline interpolation using data in the sideband bins, along with the 68% confidence intervals obtained by propagating the Poisson uncertainties in the sidebands to the splines, are shown for comparison. As expected, the spline interpolation results, which involve a stronger set of assumptions, have less statistical uncertainty.
A Bayesian approach [27] is used to set the upper limits. Poisson distributions are assumed for the observed candidate counts and uniform, non-negative prior PDFs for the  Table 3) used for the limit calculations; they are taken from the truncated-Gaussian priors as discussed in the text. Spline interpolation results (solid blue line and hashed blue areas) are shown for comparison. signal branching fractions. The systematic uncertainties in the efficiency and B + → D + s D 0 normalization are encoded in log-normal priors, while the background prior PDFs are the truncated Gaussian lineshapes discussed above. The posterior PDF, p(B|n obs ), where n obs is the number of candidates observed in the signal region, is computed by integrating over the background, efficiency and normalization. The 90% credibility level (CL) upper limit, B 90 , is the value of the branching fraction for which The upper limits are given in Table 3. The limit on B + → D + K * 0 is 1.7 times lower than any previous limit, while the B + → D + s K * 0 limit is 91 times lower. For the highly suppressed decay modes B + → D + K * 0 and B + → D + s K * 0 these are the first limits to be   Annihilation amplitudes are expected to be much larger for B + c decays due to the large ratio of |V cb /V ub |. In addition, the B + c → D + s φ, D + K * 0 , D + s K * 0 decay modes can also proceed via penguin-type diagrams. However, due to the fact that B + c mesons are produced much more rarely than B + mesons in 7 TeV pp collisions (the ratio of B + c to B + mesons produced is denoted by f c /f u ), no signal events are expected to be observed in any of these B + c channels. The Bayesian approach is again used to set the limits. A different choice is made here for the background prior PDFs because the background levels are so low. The background prior PDFs are now taken to be Poisson distributions, where the observed background counts are obtained using regions of equal size to the signal regions in the high-mass sidebands. Only the high-mass sidebands are used to avoid possible contamination from partially reconstructed B + c backgrounds. In none of the decay modes is more than a single candidate seen across the combined signal and background regions. The limits obtained, which are set on the product of f c /f u and the branching fractions (see Table 4), are four orders of magnitude better than any previous limit set for a B + c decay mode that does not contain charmonium. As expected given the small numbers of candidates observed, the limits have some dependence on the choice made for the signal prior PDF. As a cross check, the limits were also computed using various frequentist methods. The largest difference found is 20%.  To measure the CP asymmetry, A CP , in B + → D + s φ, only candidates in region (a) and in a ±2σ window (±26.4 MeV/c 2 ) around the B + mass are considered. The number of B + candidates is n + = 3, while the number of B − candidates is n − = 3. The integral of the background PDF from the fit described in detail in Sect. 4 in the signal region is n bkgd = 0.75 (the background is assumed to be charge symmetric). The observed charge asymmetry is A obs = (n − − n + )/(n − + n + − n bkgd ) = 0.00 ± 0.41, where the 68% confidence interval is obtained using the Feldman-Cousins method [28].
To obtain A CP , the production, A prod , reconstruction, A reco , and selection, A sel , asymmetries must also be accounted for. The D + s φ final state is charge symmetric except for the pion from the D + s decay. The observed charge asymmetry in the decay modes B + → J/ψ K + and B + → D 0 π + , along with the interaction asymmetry of charged kaons [29] and the pion-detection asymmetry [30] in LHCb are used to obtain the estimate A prod + A reco = (−1 ± 1)%. The large B 0 s → D + s π − sample used to determine the BDT efficiency is employed to estimate the selection charge asymmetry yielding A sel = (2 ± 3)%, where the precision is limited by the sample size. Finally, the CP asymmetry is found to be A CP (B + → D + s φ) = A obs − A prod − A reco − A sel = −0.01 ± 0.41 (stat) ± 0.03 (syst), which is consistent with the SM expectation of no observable CP violation.

Summary
The decay mode B + → D + s φ is seen with greater than 3σ significance. This is the first evidence found for a hadronic annihilation-type decay of a B + meson. The branching fraction and CP asymmetry for B + → D + s φ are consistent with the SM predictions. Limits have also been set for the branching fractions of the decay modes B + (c) → D + (s) K * 0 , B + (c) → D + (s) K * 0 and B + c → D + s φ. These limits are the best set to-date.