Probing flavoured Axions in the Tail of $B_q \to \mu^+\mu^-$

We discuss how LHC di-muon data collected to study $B_q \to \mu\mu$ can be used to constrain light particles with flavour-violating couplings to $b$-quarks. Focussing on the case of a flavoured QCD axion, $a$, we compute the decay rates for $B_q \to \mu \mu a$ and the SM background process $B_q \to \mu \mu \gamma$ near the kinematic endpoint. These rates depend on non-perturbative $B_q \to \gamma^{(*)}$ form factors with on- or off-shell photons. The off-shell form factors -- relevant for generic searches for beyond-the-SM particles -- are discussed in full generality and computed with QCD sum rules for the first time. With these results, we analyse available LHCb data to obtain the sensitivity on $B_q \to \mu \mu a$ at present and future runs. We find that the full LHCb dataset alone will allow to probe axion-coupling scales of the order of $10^6$ GeV for both $b\to d$ and $b \to s$ transitions.


Introduction and Motivation
Open questions in particle physics and cosmology may well be addressed by very light particles that interact only feebly with the Standard Model (SM).e prime example is the QCD axion [1,2], which is not only predicted within the Peccei-inn (PQ) [3,4] solution to the strong CP Problem, but which can also explain the Dark Ma er abundance if it is su ciently lighter than the meV scale [5][6][7].In the past years much activity has been devoted towards experimental searches for the QCD axion, and multiple proposals for new experiments are underway to complement ongoing e orts to discover the axion, see Ref. [8] for a review.
While most axion searches rely on axion couplings to photons, the axion also couples to SM fermions if they are charged under the PQ symmetry.Generically, these charges constitute new sources of avour violation, which induce avour-violating axion couplings to fermions, which can thus be probed by precision avour experiments.For instance, this situation arises naturally when the PQ symmetry is identi ed with a avour symmetry that shapes the hierarchical structure of the SM Yukawas [9][10][11][12], therefore, connecting the strong CP problem with the SM avour puzzle.Even in the absence of such a connection, axion models with avour non-universal PQ charges can be easily constructed and motivated by, e.g., stellar cooling anomalies that require suppressed axion couplings to nucleons [13][14][15].
In the absence of explicit models, the couplings of the axion to di erent avours are a priori unrelated, and are parametrised by a model-independent e ective Lagrangian for Goldstone bosons.e avourviolating couplings in the various quark and lepton sectors can then be constrained by experimental data, see Ref. [16] for a recent assessment of the relevant bounds in the quark sector using mainly hadron decays with missing energy.In this article we explore a novel direction to probe avour-violating axion couplings involving b-quarks using the present and future LHC data collected to study B q → µµ.
We therefore focus on avour-violating b → q transitions, which are described by the Lagrangian where F V /A bq are parity odd/even couplings, q = d, s and a denotes the derivatively coupled QCD axion, whose mass is inversely proportional to the axion decay constant, f a , which suppresses all axion couplings.
e decay constant has to be much larger than the electroweak scale to su ciently decouple the axion from the SM in order to satisfy experimental constraints [17,18]. is implies that the axion is light, with a mass much below an eV, and stable even on cosmological scales.erefore, two-body B-meson decays with missing energy, which closely resemble the very rare SM decays with nal-state neutrino pairs that have been looked for at B-factories, stringently constrain the couplings in Eq. (1.1).
e resulting constraints on the vector couplings F V bq (from B → K/πa decays) and the axial-vector couplings F A bq (from B → K * /ρa decays and B q mixing) have been given in Refs.[16] (see also Refs.[19,20]) and are summarised in Table 1.Note that constraints from neutral Table 1: Lower bounds on F V,A bq at 90% CL from B-decays and B q -mixing, taken from Ref. [16].
meson mixing are typically much weaker than the ones from decays to vector mesons, except in the case of b → d transitions.is is mainly due to the lack of experimental data on B → ρνν suitable for the two-body recast.
In the present work, we investigate whether the couplings in Eq. (1.1) can also be constrained at the LHC.To this end, we propose to use the three-body decays B s,d → µµa, where the muon pair originates from an o -shell photon, cf., Figure 1 (le ).With the main goal of measuring the SM decay B q → µµ, the ATLAS [21], CMS [22] and LHCb [23] collaborations have collected di-muon events with an invariant mass q 2 down to roughly (5GeV) 2 .As long as no vetos on extra particles in the event are applied, these datasets can be used to constrain decays with additional particles in the nal state, e.g., the radiative decay B q → µµγ, as proposed in Ref. [24].In this paper we focus on the LHCb potential although CMS and to some degree ATLAS are capable of a similar study albeit less sensitivity as can be inferred from the combined result [25].Whereas Belle II has an exciting physics program it is not competitive in very rare decays [26].Here, we point out that the same datasets can be used to constrain the decays B q → µµX, where X is a neutral, beyond-the-SM (BSM) particle with a mass that is su ciently small to be kinematically allowed at the tail of B q → µµ, i.e., m X m Bq − 5GeV ≈ 300MeV.In this respect, the radiative decay B q → µµγ merely constitutes a SM background, which we take into account in our analysis.Note that the axion hypothesis (1.1) itself has a negligible e ect on B q → µµγ and thus the SM prediction of that decay is considered.In particular, we suggest that when the measurement of B q → µµγ becomes feasible in the future, it can be directly interpreted in terms of constraining BSM particles that replace the nal state photon.A similar strategy can be applied to s → d transitions, using for example the di-muon data collected at LHCb to study K S → µµ, cf., Ref. [27], and possibly also to c → u transitions, i.e., D → µµ [28].
In the following we focus on the case of the invisible QCD axion, a, but our analysis can be readily extended to other particles appearing in the nal state, as long as they are not vetoed in the event.In particular these could be heavy axions decaying within the detector, i.e., axion-like particles (ALPs).We expect such an analysis to be fully inclusive, that is, independent of the ALP decay mode.Similarly our proposal can be extended to constrain light vectors with avour-violating couplings, e.g., dark photons or Z s.We explore these scenarios in Appendix B. In this article we demonstrate the key elements of the analysis and perform the rst sensitivity studies based on the published dataset of the LHCb collaboration.
e ATLAS and CMS data can be analysed analogously.e photon o -shell form factors are necessary for predicting branching fractions of B q → X where X is any of the above mentioned light BSM particles.We discuss the complete set of form factors, relevant for the dimension-six e ective Hamiltonian, compute them with QCD sum rules (SRs) and t them to a z-expansion.In addition the o -shell basis is shown to be related to the standard B → V = ρ 0 , ω, φ . . .basis through a dispersion representation, which interrelates many properties of these two sets of form factors.
is article is organised as follows: In Section 2 we provide the di erential rates for the axionic decay B q → µµa and the radiative decay B q → µµγ.In Section 3 we provide the tools necessary to perform the analysis and use available background estimates and data from LHCb's B s → µµ measurement to evaluate the sensitivity to B q → µµa at present and future runs.We conclude in Section 4. Appendix A contains the computation of the B → γ ( * ) form factors as well as their extension through a dispersion relation to the region of low lying vector mesons.In Appendix B we provide the di erential rate for B → γ * (→ + − )V and apply the proposed analysis.

Di erential Decay Rates
In this section we calculate the di erential rates for the axionic B q → a and radiative B q → γ decay channels.In Figure 1, we show on the le the diagram for the axionic decay and in the centre and on the right representative diagrams for the radiative decay.e rates are di erential in the lepton-pair momentum q ≡ p + + p − , and depend on non-perturbative B q → γ ( * ) form factors with on-or o -shell photons, which we brie y introduce before presenting the di erential decay rates.Finally, we evaluate the rates close to the kinematic endpoint (4.9GeV) 2 q 2 < m 2 Bq , and compare our prediction for the radiative decay to results in the literature.e diagram to the le is the main axion process B q → a whereas the two diagrams in the centre and the right belong to the B q → γ background.e single and double lines stand for the q and b-quark, respectively.e le and central diagrams depend on o -shell form factors in the sense that the photon that emits the two leptons is o -shell.Diagrams in which the photon couples to b-quarks are not shown, but are analogous.Also diagrams with Q 9,10 -operator insertions are not shown, and resemble the diagram on the right and are proportional to C 9 V ⊥ and C 10 V .
2.1 Summary on B q → γ * form factors We describe B q (p B ) → γ * (k) transitions with o -shell photons by a set of form factors with two arguments F * (q 2 , k 2 ) ≡ F B→γ * (q 2 , k 2 ).e rst argument (here q 2 ) denotes the momentum transfer at the avour-violating vertex while the second argument (here k 2 ) denotes the momentum of the photon.For on-shell photons, i.e. k 2 = 0, these form factors reduce to the well-known on-shell form factors F (q 2 ) ≡ F * (q 2 , 0) given in Eq. (A.22).A complete set of form factors is given by 1, 2 where q2 ≡ q 2 /m 2 B throughout, q ≡ p B − k denotes the momentum transfer at the avour-violating vertex, and we de ne the o -shell photon state γ * (k, ρ)| as in Eq. (A.2). e coe cients depend on the sign convention, s e , for the covariant derivative e Lorentz tensors R are de ned by where G αβ ≡ g αβ − k α k β /k 2 and (kinematic) ha ed quantities are divided by m 2 Bq , e.g.k2 ≡ k 2 /m 2 B . e matrix element satisfy the QED and the axial Ward identities e la er implies the relation between the pseudoscalar and one of the axial form factors which reduces the number of independent form factors down to a total of seven.At q 2 = 0 there are two further constraints where V * L (q 2 , k 2 ) ≡ −q 2 /2V * L (q 2 , k 2 ) thereby reducing the form factors down to ve.An extensive discussion including dispersion representations in the q 2 and k 2 variables, the derivation of Eq. (2.5), the limit to photon on-shell form factors, and their computation from QCD SRs are deferred to Appendix A. As an example let us quote the once-subtracted dispersion representation for the pseudoscalar form factor for which simpli es for the subtraction point k 2 0 = 0. Above f em φ is the electromagnetic decay constant in the Appendix A.2 and A 0 is the pseudoscalar B s → φ form factors [29].In the last step the dispersion relation was evaluated for the lowest term and the dots stand for terms higher in the spectrum e.g.B s → φ and multiple particle con gurations.e o -shell form factors in the limit of small momentum transfer at the avour-violating vertex, T * ⊥, ,L (0, q 2 ), V * ⊥, ,L (0, q 2 ) and P * (0, q 2 ) are computed in this work for the rst time. 3e B → γ * (→ + − )V di erential rate, presented in Appendix B, is another example application of these o -shell form factors for searches beyond the SM. e extension of the o -shell form factors to low k 2 into the resonance region, as shown in (2.6), requires the matching to the QCD SR result.e discussion thereof is delegated to Appendix A.4 and involves, by choice, a multiple subtracted dispersion relation where additionally use is made of the on-shell form factors. Plots are shown in Figure 6.An ancillary Mathematica notebook is added to the arXiv version for reproducing the form factors. is is for example relevant for the prediction of B → γ as the two form factors T * ⊥, (0, k 2 ), where k 2 is the dilepton mass, enter the description.ese two form factors have also been considered in [32][33][34] using improved vector meson dominance.
For the on-shell form factors B → γ we use the next-leading-order (NLO) light-cone sum rule (LCSR) computation [35].Note that the QCD SR result of the o -shell form factors can be used in the relevant kinematic region (4.9GeV) 2 q 2 < m 2 Bq since thresholds are far away.e photon on-shell form factors are more challenging in this region because the light-cone expansion breaks down.ey can, however, be extrapolated to this region by using a B * q and B q1 -pole ansatz, with the residue computed from LCSR [36], supplemented with z-expansion corrections to account for further states.e weak annihilation process, that is B → V γ * four-quark matrix elements where the B valence quarks annihilate, contain some of these form factors as sub processes.Weak annihilation has been computed in the SM to leading order (LO) in QCD factorisation [30] and including all BSM operators in LCSR [31].However, the discussion in our paper is more complete as even the BSM computation in Ref. [31] does not include all form factors since the V -mesons do not couple to scalar operators for instance.Moreover in Ref. [32], the o -shell form factor T * ⊥ (0, k 2 ) = FT V (0, k 2 ) is evaluated using a vector-meson-dominance approximation.

The B q → a di erential rate
Given the e ective Lagrangian in Eq. (1.1), the amplitude for Bq (p where A er squaring this amplitude, summing over fermion spins, and integrating over the unobserved axion momentum, the di erential rate in the invariant mass of the nal-state leptons, q 2 , becomes where Bq ≡ λ(m 2 Bq , q 2 , m 2 a ), and λ(x, y, z) ≡ x 2 + y 2 + z 2 − 2xy − 2xz − 2yz is the Källèn function.For our work it is su cient to approximate m a → 0.

2.3
The B q → γ di erential rate e relevant part of the e ective SM Lagrangian5 where λ t ≡ V * tb V tq and q L,R ≡ (1 ∓ γ 5 )/2q, and Q 7,9,10 is obtained from Q 7,9,10 by the replacements L → R and m b → m q .
Upon using the helicity vector completeness relation (A.19), the Bq (p Bq , q 2 , 0) and r q ≡ −is e e/2 λ can be factored into a leptonic L V,A λ = * ν (q, λ)ū(p − )γ ν [γ 5 ]v(p + ) and the hadronic helicity amplitude follows from the Hamiltonian with where the leptons are removed up to a normalisation factor.For the e ective axial leptons one nds and for the vector ones where M µρ (k, q) takes into account both types of diagrams in Figure 1.Explicit parameterisations of the polarisation vectors and some more explanations can be found in Appendix A.1.3.Using the (common) convention (2.14) e last equality relates T * (0, q 2 ) to T * ⊥ (0, q 2 ), see Appendix A.1.6 and footnote 8 just before Eq.(A.1).Above we omi ed the contribution from photons radiated o nal-state muons, because these are obtained from the B q → µµ rates using PHOTOS, cf., Ref. [37].Going slightly lower in q 2 would necessitate the inclusion of broad charmonium resonances [38,39].For an overview of other non form-factor matrix elements see for instance Refs.[32,38].
A er integrating over the unobserved photon momentum, the di erential rate for the radiative mode and c V ≡ (q 2 + 2m 2 ), c A ≡ (q 2 − 4m 2 ) are e ectively the squared leptonic helicity amplitudes.
2.4 B q → µµa and B q → µµγ close to the kinematic endpoint To illustrate the relative importance between the SM background B q → µµγ and the B q → µµa signal we take as a reference value for the avour-violating coupling F A bq = 10 6 GeV.In Figure 2, we show the di erential rate normalised with respect to the two-body decay width where X = a, γ, m 2 µµ ≡ q 2 .In the le panel, we show the predictions for the B s decays and in the right the corresponding ones for the B d case.e binned (green) predictions are the B q → µµ rates including photon radiation from the nal-state muons using PHOTOS (see Ref. [37]).e red solid lines are the rates from the axion mode for the reference value F A bq = 10 6 GeV (note that the relative enhancement between le and the right panel is due to the normalization, which carries a di erent CKM suppression.).e black lines are the B q → µµγ predictions when the photon does not originate from muon bremsstrahlung.ey depend on the treatment of the non-perturbative input, i.e., the hadronic form factors introduced in Section 2.1.In all cases, we use the same perturbative input, namely the SM Wilson coe cients C e 7 , C e 9 and C 10 evaluated at the hadronic B q scale.We obtain C 10 from Ref. [37] and use flavio [40] to evaluate C e 7 and C e 9 .We show the results of three di erent approaches of estimating the relevant hadronic form factors: • Dashed line: the QCD SR form factor computation discussed in Section 2.1 and Appendix A,  Figure 2: Comparison of the axionic decay mode B q → µµa (red solid lines) and the radiative B q → µµγ modes (black lines).e le panel shows the B s case while the right the B d case.For the axion predictions F A bq = 106 GeV is assumed as a reference value.e di erent black lines are the photon predictions with di erent form factor treatments (see legend and main text).In green are bins of the two-body B q → µµ rate including radiation from nal-state muons.To be er compare the B s and B d cases, all rates are normalised to their respective two-body decay B q → µµ, which is why the B d → µµa line appears enhanced with respect to the B s → µµa one.
• Dotted line: the quark-model approach of Ref. [32], • Dashed-dotted line: the pole-dominance approach supplemented by experimental data and heavy-quark e ective theory of Ref. [41].It is speci c to the B s case (le panel).
e agreement of the predictions is rather crude.For q 2 ≈ (4.9GeV) 2 , our prediction is about a factor of three larger than the quark model [32] and about a factor of two smaller than the pole-dominance approximation [41].e disagreement with the quark model is not surprising as the method is designed for low q 2 and, unlike in our work, no additional input is employed to constrain the residua of the leading poles near the kinematic endpoint.e agreement of the form factors themselves at lower q 2 , which we do not show, is much be er.e comparison with the pole-dominance approach [41] has two major components.e di erence in the B * q -residue and the fact that the e ect of B q1 -resonance is neglected in Ref. [41] cf.Appendix A.5.1.While it is important to understand 6 the origin of the discrepancy in light of a possible measurement of the radiative decay, the discrepancy does not play a signi cant role in obtaining a bound on the axion couplings F A bq , which we derive in the next section.

Sensitivity at LHCb
In this section we recast the LHCb analysis of Ref. [23] to obtain an estimate for the current and future sensitivity of LHCb to probe the avour-violating couplings F A bs and F A bd .We rst discuss, in Section 3.1, how we extract the backgrounds by rescaling the original LHCb analysis, and derive the expected number of events in each bin for a given luminosity.We then describe, in Section 3.2, our statistical method and provide the recast of the present data and the sensitivity study for future runs.Our main results are summarised in Tables 2 and 3.

Rescaling the LHCb analysis
e B s → µµ analysis of LHCb in Ref. [23] makes use of datasets collected at di erent LHC runs, with luminosities L 7 = 1.0 −1 from 7 TeV, L 8 = 2.0 −1 from 8 TeV, and L 13 = 1.4 −1 from 13 TeV runs.Under the SM hypothesis, a total number of 62 B s → µµ events and 6.7 B d → µµ events are expected in this analysis in the full range of boosted-decision-trees (BDT) and the signal window (m µµ ∈ [5.2, 5.445] GeV).Since the BDT discrimination is at one expects half of these events to pass the BDT > 0.5 selection.For this BDT selection, LHCb supplies a plot with backgrounds, which we use to extract their numerical values.By combining the expected number of B q → µµ events in the SM with the SM branching-fraction predictions, we extract a universal rescaling factor, r ≈ 0.079, via In these equations, i labels the √ s run and σ i is the corresponding b-quark production cross section in the acceptance of LHCb.e la er has been measured by LHCb for √ s = 7, 13 TeV, σ b,7 = 72 µb and σ b,13 = 144 µb [44].For σ b,8 we linearly rescale the 7 TeV value (σ b,8 = 8/7σ b,7 ).f d and f s are the fragmentation ratios of b-quarks that are produced at LHCb and fragment into B d and B s , respectively.We absorb f d in the rescaling factor, r, and use the ratio f s /f d to obtain N Bs . is ratio has been measured by the LHCb collaboration to be f s /f d = 0.259 ± 0.015 [45].Finally, summarises the experimental e ciencies and all other global rescaling factors, which we absorb into the de nition of r. e quantities BR's in Eq. (3.1) are the respective branching ratios in the signal window.is includes the e ect of photon radiation from muons [37,46], which LHCb simulates with PHOTOS.e overline in the branching-ratio prediction indicates that the partial width is divided by the width of the heavy mass eigenstate (Γ H Bs , Γ H B d ) to obtain the branching fraction.In this way the e ect of B q -mixing is included [37,47].
is is relevant for the B s system, but much less so for the B d system. is is numerically equivalent to LHCb's treatment of the e ective lifetime, cf.Eq. (1) in Ref. [23]).
LHCb's BDT > 0.5 selection covers the m µµ ∈ [4.9 GeV, m Bs ] region in bins of 50 MeV.We apply the same universal rescaling factor, r, to rescale the predictions of all B q → µµa and B q → µµγ branching fractions for all m µµ bins. is is a good approximation as there are no triggers or similar thresholds that signi cantly change the rescaling over this invariant-mass range.In the next section, we present the sensitivity of this analysis to probe the avour-violating F A bs and F A bd axion couplings in future runs of LHCb by rescaling the 13 TeV dataset.We denote the corresponding e ective total luminosity by At a given total luminosity, L, the expected number of events at a given m µµ -bin (Bin k ) then is with shorthands ) .e quantity N BKG,analysis Bin i is the expected total number of background events that do not originate from the radiative decay in the given bin.We obtain N BKG,analysis Bin i by digitising and integrating the plot of LHCb's BDT > 0.5 selection.In Eq. (3.3) we kept separate the rate from photon emission from muons (B q → µµ(nγ)) and the rate from photon emissions from the initial state (B q → µµγ).In principle, the amplitudes interfere but the interference is tiny close to the B q threshold and we thus neglect it.

Recast and sensitivity analysis
To compute the sensitivity of the LHCb analysis in probing F A bs and F A bd , we must combine the information of all m µµ bins and include statistical and systematic uncertainties.We neglect the subdominant experimental systematic uncertainties but will include the theory uncertainties associated to the form factors entering the three-body rates.In what follows we always either turn on F A bs or F A bd , i.e., but will not let them oat simultaneously.
Each m µµ bin corresponds to an independent counting experiment that obeys Poisson statistics.Exclusion limits on F A bq are then obtained from a joined Poisson (Log)Likelihood.For a su ciently large number of events, Poisson statistics are well described by Gaussian statistics and the Poisson (Log)Likelihood is equivalent to a χ 2 function of the NP parameter, i.e., F A bq : with i numbering the bins and q = s, d.N i = N i (F A bq ) denotes the total number of events (background plus signal) for the value F A bq in a given bin, whereas N obs i is the observed number of events.For the recast we use the actual number of events observed by LHCb, read o from Figure 1 in Ref. [23].To project the sensitivity for future LHCb runs we set N obs i to the number of events expected in the SM. e covariance matrix, V cov , incorporates statistical and systematic uncertainties in a way that we discuss below.If we neglect systematic uncertainties, this matrix is diagonal and only contains the squared Poisson variances, V cov = V stat with (V stat ) ij = δ ij N i .We have explicitly checked, that for the data samples considered here, the Poisson (Log-)Likelihood is always very well approximated by the χ 2 .
To incorporate systematic/theory uncertainties we follow the commonly used approach of Ref. [48].eory uncertainties are then treated as Gaussian uncertainties smearing the expectation values of the underlying Poisson probability distribution functions.We can then obtain the limits on F A bq by generating Monte-Carlo events based on the joined Poisson likelihood a er smearing the expectation values by the (correlated) systematic errors.If the measurement is well-described by Gaussian statistics (as in our case) and the systematic uncertainties are small with respect to the statistical ones, this treatment of uncertainties is equivalent to adding the statistical and systematic errors in quadrature in V cov .
In our case the main systematic uncertainties are due to the form factors that enter the radiative B q → µµγ and the B q → µµa rate.Since the uncertainties in the form factors originate in part from uncertainties in input parameters like m b and qq that are q 2 -independent, the predicted number of events among di erent bins are correlated.erefore, the full covariance matrix for the case in which the axion has a coupling F A bq is not diagonal and decomposes into Here, (V stat ) ij = δ ij N i are the statistical uncertainties, while the matrices V γ , V q a , and V q a−γ describe the correlated errors among the predictions of various rates over the bins.Aside from trivial functional ).e analysis employs a total of 4.4 fb −1 of data from runs at 7, 8, and 13 TeV.In the columns labelled "sys+stat" we combine statistical and theory uncertainties, while in the columns labelled "stat only" we neglect the la er.We see that presently the bounds are dominated by statistical uncertainties.When computing the χ 2 we sum over the ten rst bins of the analysis, i.e., m µ + µ − ∈ [4.9 GeV, m Bs ].
For every case we list the values of the χ  bd (hatched region) as a function of the total integrated luminosity.Shown are the 90% CL exclusion limits assuming that the observed number of events will be the same as predicted in the SM hypothesis.
dependencies on global rescaling factors, e.g., luminosity, we can determine them once and for all by generating Monte-Carlo events in which we vary the parameters on which the form factors depend.In practice we use the mean values of the z-expansion t (of degree four) and their covariance matrix (see Appendix A.5.3) to determine each piece of V cov .Using the covariance matrices we obtain the 90% Con dence Level (CL) exclusion limit on |F A bq |, i.e. χ 2 (F A bq,90% ) − χ 2 min = 1.64.First, we recast the observed data of LHCb's analysis [23] in which L = L = 4.4 −1 .e measurement is dominated by statistical uncertainties, but for purposes of illustration we show both the bounds when combining statistical and systematic theory errors and the bounds when only the statistical uncertainty is included.In the χ 2 we include the rst ten bins of the LHCb analysis.e observed data are in good agreement with the SM expectation.Indeed, we nd that the χ 2 of the SM divided by the ten degrees of freedom of the χ 2 (d.o.f.) is χ SM /d.o.f.= 1.6.e best-t points for the axion lies roughly 1σ o the SM.In Table 2 we list the best-t points with their corresponding χ 2 min , as well as the resulting 90% CL exclusion limits on |F A bs | and |F A bd |. ) as a function of the integrated luminosity at LHCb.In the columns labelled "sys+stat" we combine statistical and theory uncertainties, while in the columns labelled "stat only" we neglect the la er.
Next we make projections for future runs of LHCb.As discussed in Section 3.1, to this end we rescale the 13 TeV events assuming LHCb will collect a total of 300 −1 .To compute the sensitivity we assume that LHCb will observe exactly the number of events expected from the SM.erefore, the best-t point always corresponds to observing zero events from axion decays and χ 2 min = 0.For the projection study we present the results both when only statistical uncertainties are considered and when they are folded with the correlated theory uncertainties.In Figure 3 we show the resulting 90% CL exclusion limit on |F A bs | (le panel) and |F A bd | (right panel) as a function of the total luminosity.In addition, the limits for some indicative luminosities are listed in Table 3.
Note that the limit from the actual recast is weaker than the expected limit under the background-only hypothesis.More precisely, if we consider the case L = 4.4 −1 and set N obs GeV.In comparison, the corresponding exclusion limits of the recast (table 3) are slightly weaker.e origin of this di erence is mainly an excess of roughly 10 events in the rst bin of the current LHCb B s → µµ analysis, which can be ed by the best-t point of an axion signal.However, as discussed in the recast the excess is not statistically signi cant and the best-t point of the axion is within 1σ of the SM.

Summary and Outlook
In this article we have proposed a novel method to probe avour-violating couplings of the QCD axion to b-quarks at the LHC, exploiting the di-muon datasets collected for the B q → µµ analyses.To this end, we have computed the relevant di erential decay rates for the decay of a B q -meson to muons and an axion B q → µµa [Eq.(2.8)] and the radiative decay B q → µµγ [Eq.(2.15)], which is a background to the former process.
ese rates depend on non-perturbative B q → γ ( * ) form factors, which we have discussed from a general viewpoint, computed with QCD sum rules (at zero avour-violating momentum transfer).To the best of our knowledge this is the rst discussion of the complete set of form factors, for the dimension-six e ective Hamiltonian H b→(d,s) e , supplemented with an explicit computation of all form factors in Appendix A. e uncertainty of this leading order computation is estimated to be around 25% and could bene t from radiative corrections which is, however, an elaborate task.We extend these form factors to the low-lying resonance region of un avoured vector mesons by using a multiple subtracted dispersion relation cf.Appendix A.4.A Mathematica notebook is appended to the arXiv version.Besides being useful for axion searches these form factors are also the ingredients for other light BSM particle (e.g.dark photon) searches.In addition, we have exposed the relation between the introduced basis and the standard B → V basis through the dispersion representation in Appendix A.2, which interrelates form-factor properties of the two bases.A further application of the o -shell form factor formalism is to consider the charged case with q 2 , k 2 > 0 which is needed to describe B → + − ν.We postpone this task to a future study as this involves discriminating charges and dealing with contact terms but note that these decays has recently been described by di erent groups [49][50][51][52] using di erent approaches.
With these decay rates we performed a recast using available LHCb data and estimated the sensitivity to B q → µµa at present and future runs, taking into account the SM background B q → µµγ.We nd that present data constrain the relevant axion couplings F A bd (F A bs ) to be larger than 3.0(2.4)• 10 5 GeV at 90% CL [Table 2], while the full LHCb dataset will probe scales of the order of 10 6 GeV in both b → d and b → s transitions (Table 3).
For stable axions, these results should be compared with the ones derived from B-meson decays with missing energy.In the case of b → s transitions, the data from the BaBar collaboration on B → K * νν provide constraints that are roughly two orders of magnitude stronger than the ones from our LHCb recast of B s → µµa, cf.Table 1.For the case of b → d transitions, the BaBar constraints are roughly of the same order than the ones that LHCb can obtain in upcoming runs.Nevertheless, the combination with the corresponding ATLAS and CMS analyses of B q → µµ may improve the bounds signi cantly.In Appendix B we made further use of the o -shell form factors to estimate the strength of LHCb to search for light, Beyond the Standard Model, vectors.
While it is remarkable that the LHC can play a role in constraining couplings of the QCD axion, the analysis of B q → µµa that we have presented here can be relevant for other extensions of the SM with light neutral particles with avour-violating couplings.Since the B q → µµa analysis is inclusive, it can be extended to search for light BSM particles even if they decay within the detector.For example, an ALP that decays promptly to, for instance, photons may be subject to cuts on additional photons in the analyses of B → K(a → γγ) at the B-factories and thus evade detection, while it would be kept in the B q → µµ(a → γγ) samples at the LHC.erefore, the analysis that we have presented here complements axion searches in rare meson decays with missing energy at B-factories, and can play an important role in constraining avour-violating couplings of light particles.
the photon o -shell momentum which is a variable rather than a constant.Hence the MEs are functions of two variables and this leads to a more involved analytic structure.In this paper we restricted ourselves to the kinematic region q 2 ∈ [(4.9GeV) 2 , m 2  Bq ], where the form factors (FFs) can be expected to dominate over long-distance contributions.
is appendix is structured as follows.Firstly, we de ne and state relation and limits of the FFs in Section A.1, the link with the B → V basis is discussed in Section A.2, the QCD SR computation of the o -shell FFs follows in Section A.3 and nally we turn to the FF-parametrisation and ts in Section A.5. Note, that sections A.1, A.2 and A.5 are independent of the method of computation.

Form Factor
De ned in Eqs.
(2.1, A.1) Table 4: Overview of FFs referencing de nitions, graphs, and analytic structure.e la er de nes the region of validity of the computation.Long-distance contributions are relevant in other kinematic regions [32,38].For B d → γ * , m 2 φ is to be replaced by m 2 ρ,ω above.
A.1 Definition of B → γ ( * ) form factors We introduce a complete set of o -shell FFs which is related to the standard B → V basis [29,54] via dispersion relations, cf.Section A.2. On a technical level this appendix extends previous work [32,53], in that we discuss the full set of seven vector and tensor FFs and not only those needed for the SM transition.e complete basis is for example useful for other invisible particle searches such as the dark photon.e o -shell FFs are not to be confused with the on-shell FFs which have received more a ention in the literature [32,33,53,55].An overview of the on-and o -shell FFs used for this paper are shown in the diagrams in Figure 1 and contrasted in Table 4.
e rst argument (here q 2 ) denotes the momentum transfer at the avour-violating vertex while the second argument (here k 2 ) denotes the momentum of the photon emi ed at low energies.
We introduce a new o -shell basis via a dispersion representation based on the standard B → V basis [29,54].Below we state the basis before turning to the construction in Section A.2. e absence of unphysical singularities in the matrix element enforces relations between FFs which we discuss in some detail.We will refer to this circumstance as "regularity" for short.
e complete set of FFs were already introduced in the main text in Eq. (2.1) and reproduced here for convenience 7,8 where b P ≡ where are Lorentz tensors with convenient properties (cf.below) and herea er and it is noted that R ρ = q µ G µρ (A.3).
which in turn holds without contact term since the electromagnetic current is invariant under non-singlet axial rotations.Eq. (A.7), upon using q µ R µρ ⊥, ,L = 0, implies that Regularity enforces constraints on the FFs de ned in (A.1). 9ere are two constraints at q 2 = 0 and Bq respectively.e Ward identity (A.8) enforces where V * L is implicitly de ned by ere are two further constraints due to the parametrisation of the form factors at which are of a similar type as the .39).Whereas the constraints (A.9) and (A.12) are imposed by the FF-parametrisation (avoiding spurious kinematic singularities), (A.11) is of algebraic origin cf.Section A.1.6 for the derivation.
A.1.3The four photon on-shell form factors F (q 2 ) ≡ F * (q 2 , 0) We next turn to the case where the low-energy photon is on-shell; k 2 = 0. We introduce the commonly used shorthand F (q 2 ) ≡ F * (q 2 , 0) , for F ∈ {P, V ⊥, ,L , T ⊥, ,L } , (A.13) (or F B→γ (q 2 ) ≡ F B→γ * (q 2 , 0)). e basic physics idea is that the absence of the photon's zero helicity component implies the vanishing the pseudoscalar FF and the zero helicity part of the vector FFs.We may de ne the helicity amplitude for B → µµγ by One then obtains the two B → γµµ helicity amplitudes and the ± direction obey and the formulae for the vector V as well as the tensors T, T 5 are similar.e common de nition explains the notation.In order to obtain these results it is useful to choose an explicit helicity vector parameterisations.E.g. in the B q -meson restframe (e.g.[Eq.15] in [56]) ω(q, 0) = (−v, 0, 0, q 2 + v 2 )/ q 2 , ω(q, t) = q/ q 2 , q = ( q 2 + v 2 , 0, 0, −v) , where the velocity is given by v ≡ | k| = λ 1/2 (m 2 Bq , q 2 , k 2 )/(2m Bq ) and k → k µ is the Lorentz-index convention in the formulae above.is helicity basis is useful since with G λλ = diag(1, −1, −1, −1) and the rst entry denotes the t-component.Second, in the limit k 2 → 0, * (k, 0) ∝ k, and this enforces, since it is equivalent to the QED Ward identity (A.6).Eqs.(A.15, A.20) lead to the constraints V (q 2 ) = V L (q 2 ) , T (q 2 ) = T L (q 2 ) , P (q 2 ) = 0 , (A.21) and reduces the seven FFs of Eq. (A.1) to four.Alternatively one can infer the constraints (A.21) from the regularity of the matrix elements as k 2 → 0. e regularity condition and the helicity arguments are clearly related.For completeness we give the explicit k 2 → 0 basis [38,55] 10 where e charged FF Bu → γ ( * ) is similar but comes with a non gauge invariant contact term for the axial vector structure.is contact term is canceled by the photon emission of the lepton [35].
Relation to the standard B → V basis We consider it worthwhile to comment on some aspects in the standard basis of B → V FFs e.g.[29].e k 2 → 0 limit is then akin to m V → 0. e relations V (q 2 ) − V L (q 2 ) = T (q 2 ) − T L (q 2 ) = 0 implies Such relations were noted previously.Firstly, in the B → V context in Ref. [57] in Appendix A and around Eq. [5] in Ref. [31], where it is argued that the relation has to hold in order to cancel a kinematic 1/m V -factor.Second for B → γ (m V = 0) they were previously reported in Ref. [53] as a consequence of regularity.
A. 1.4 The five form factors F * (0, k 2 ) at zero flavour-violating momentum transfer In the process B q → X, with X a light BSM particle, the limit in which the avour-violating momentum transfer goes to zero, i.e., q 2 = 0, corresponds to the case of zero or small mass of X.In this limit the two constraints in Eqs.(A.9) and (A.11) reduce the number of independent FFs from seven to ve. e matrix elements, at q 2 = 0,the become where Eqs.(A.9,A.11)and 2k•q| q 2 =0 = m 2 Bq − k 2 have been used.At q 2 = 0 the constraints (A.12) imply Bq ) nite, the last constraint is obeyed trivially by (A.11).

A.1.5 Counting form factors
Since the last few sections were a bit involved with many steps we summarise the classi cation in Table 5.
In general there are seven FFs for the B → 1 − transition.In the photon on-shell case this reduces to four because the photon comes with two polarisations only.In the case of zero avour-violating momentum transfer the two general constraints (A.9, A.11) reduce this number from seven to ve.
At last we turn to the derivation of the relation (A.11).We may choose to proceed by uncontracting the B → V matrix element, rst in q ν (A.1) with shorthands x i = x i (q 2 , k 2 ), p = p B , η is the polarisation vector of a massive vector boson and square brackets denote antisymmetrisation in the respective indices.e corresponding uncontracted type\ Table 5: e J P = 0 + FF vanishes by parity conservation of QCD.Generally, there are seven independent F * (q 2 , k 2 ) FFs (light-blue) with two constraints V * L (0, k 2 ) = P * (0, k 2 ) (A.9) and T * (0, .11).For the photon on-shell case, F (q 2 ) ≡ F * (q 2 , 0), there are four independent FFs (light-red) and the reduction is due to the absence of the photon 0-helicity component.At zero avour-violating momentum there are ve independent FFs (light-green), due to the two constraints mentioned above.For the computation of the B → γ SM rate, the following ve FFs are su cient {V ⊥, (q 2 ), T ⊥, (q 2 ), T * ⊥ (0, k 2 )}.
B → γ * matrix element then reads where with c some i-independent kinematic function (X i = cx i ) which is irrelevant for our purposes.e appearance of the tensor G ρα can be understood from the viewpoint of a dispersion relation cf.Section A.2 or and footnote 12. Regularity enforces at k and at k 2 → 0 we have X 0 (q 2 , 0) + X 2 (q 2 , 0) = 0 .(A. 31) ese two constraints are generally valid.
We may make the connection with our basis by identifying to obtain where mV ≡ m V /m Bq .e analogue of the two (q 2 = 0) constraints (A.9, A.11) are respectively.e constraint (A.12) does not apply since m 2 V = m 2 Bq .As stated above the relation between the FFs becomes apparent in the dispersion representation (cf. the textbook [58] or the recent review [59]).A speci c example is chosen for illustration,13 where u low = (m P 1 + m P 2 ) 2 , and V → P 1 + P 2 is the lowest decay channel (e.g. Note, that the appearance of the tensor G ρ α (A.5) goes hand in hand with the QED Ward identity constraint.
In order to further illustrate we resort to the narrow width approximation (NWA) which can be improved by introducing a nite decay width or be er multiparticle states of stable particles (cf.remark at beginning this section).With the NWA and the spectral or discontinuity function ρ T i (q 2 , u) assumes the simple form where the dots stand for higher states in the spectrum.e residua r V T i are then given by where f em V is a conveniently normalised matrix element of the electromagnetic current and the vector meson.In particular Rewriting our parametrisation (A.1), in compact form, and equating with (A.40) we are able to identify the two bases where ω Ji is, by construction, a matrix with diagonal entries only.Of course we could have chosen any other basis at the cost of having a non-diagonal ω-matrix but we feel that this is an economic way.
Let us make the dispersion representation more concrete by clarifying what the subtraction terms mean in (A.40).Whether or not the B → γ * FFs does require a subtraction is practically not important since it is advantageous to do so anyway as the on-shell FF, which provides the subtraction information, is well-known and also physical. 14One may write where J =⊥, L, P and i = 1, 3, P with ω Ji de ned above and the same formula applies for T * J → V * J and ρ T i → ρV i .e T, V -FFs are a bit more involved since the matching factor ω 2 ∝ (1 − k2 ) contain a non-trivial k dependence.In order to avoid the arti cial pole we may de ne an expression which establishes regularity at k 2 = m 2 Bq and the corresponding subtracted dispersion relation reads e same applies again for V * with the substitutions T * → V * and ρ T 2 → ρV 2 .e analogy with (A.49) is restored if one divides the la er equation by ω Ji .
For the sake of clarity, we give a few examples of FFs in the k 2 -dispersion representation (A.49) which 14 e asymptotic behaviour of the triangle function at LO, cf. Figure 4 (le , center), is O(ln k 2 ) as can be inferred from the explicit expressions in (A.58).
is continues to hold when resumming the leading-log expressions, cf.Ref. [60], for the correlation functions in question (with Nc = 3).
e dispersion relation in p 2 B and its restriction to the nite interval [m 2 b , s0] leads to a FLOPT ∝ 1/k 2 asymptotic behaviour.It is possible that this changes at NLO.However, there are condensate terms of order F LO qq ∝ O(1) as can be inferred from (A.58) originating from diagrams where the photon is emi ed from the b-quark.It might be that at NLO this turns out to be O(ln k 2 ).Note that, reassuringly, in the limit q 2 , k 2 → −∞ all condensate terms vanish as one would expect since perturbation theory dominates in this regime.At least when investigating the mixed quark-gluon condensate term one can see that it is suppressed relative to the condensate terms suggesting that the OPE itself converges.In passing let us note that the di erence to B → V or B → γ FFs is the o -shellness of the photon which does not require a double cut for the FF and can lead to a more divergent expression.
illustrates some of its properties: where the minus sign is due to the minus sign in (A.43) which in turn comes from the electromagnetic interaction term.Above the k 2 + i0 prescription has been dropped for brevity and has been used.ese formulae show that properties of the B → γ * -and B → V -FFs imply each other.For example, the ). e algebraic relation (A.11) follows from (A.49), if T (0, m 2 Bq ) = 0 holds which in turn follows from (A.12).In the SU (3) as expected.ese relations can be turned around since they hold for any k 2 , they necessarily hold at each point of the spectrum and thus for the B → γ * properties imply the B → V FF properties.
Moreover, the examples reveal that the slope of the FF are positive which is the choice by convention. is is the case since r φ > 0 and r ρ > |r ω | > 0. At last let us note that a particularly convenient form for P * can be obtained if ones chooses the subtraction point k 2 0 = 0 where the pseudoscalar FF vanishes.is corresponds to (2.6) in the main text given as an illustration.

A.3 Explicit results of the o -shell form factors
e FFs are computed using QCD SRs [61].e starting point is the correlation function of the form x,y where (b V s e e) = −m Bq , Π V,A = Π V,A (q 2 , p 2 B , k 2 ) are analytic functions in three variables and the Lorentz structures R µρ are de ned in (A.3).Gauge invariance, again, holds in the simplest form k ρ Π V µρ (p B , q) = 0 since we work with electrically neutral states.e term C µρ (q 2 ) is a contact term but of no relevance for our purposes since they are p 2 B -independent.It is the correction to the naive non-singlet axial Ward identity (A.7).e operator J Bq ≡ (m b + m q ) biγ 5 q is the interpolating operator for the B q -meson with matrix element Bq |J Bq |0 = m 2 Bq f Bq .e QCD SR is then obtained by evaluating (A.54) in the operator product expansion (OPE) (cf. Figure 4) and equating it to the dispersion representation.e OPE consists of a perturbative part and a condensate part for which we include only the quark condensate.e OPE is convergent, in a pragmatic sense, for momenta p 2 B , q 2 < O(m b Λ) and k 2 < −Λ 2 with Λ ≈ 500MeV a typical hadronic scale.e perturbative part is evaluated with the help of FeynCalc [62,63].We neglect light-quark masses i.e. m d = m s = 0.
e dispersion representation of Π V ⊥ reads where the dots stand for higher resonances and multiparticle states.Moreover the NWA for the B-meson has been assumed.e FFs are then extracted via the standard procedures of Borel transformation and by approximating the "higher states" contribution by the perturbative integral [61].e la er is exponentially suppressed due to the Borel transform in p 2 B .Note, that the contact term C µρ (q 2 ), which can appear as a subtraction constant in the dispersion relation, vanishes under the Borel transform.Above πρ ] and M 2 is the Borel mass.If we were able to compute ρ V ⊥ exactly then V ⊥ (q 2 ), obtained from (A.56), would be independent of the Borel mass and it therefore serves as a quality measure of the SR.Other FFs are obtained in exact analogy with the exception of F (F = V, T ) Let us turn to a technical point.Namely on how to avoid spurious kinematic singularities.e constraints (A.12) avoid these constraints and in the computation they are satis ed provided that s = m 2 Bq .However, since that is only satis ed within O(1%) in a SR some care needs to be taken.is procedure is equivalent to considering the FF combination in (2.1) proportional to 1/(1 − k2 ) as a single FFs and thus avoids this spurious pole which is the correct treatment.More concretely, let us introduce is procedure is equivalent to de ning the improved density as follows: Crucially, the extra 1/(1 − k 2 /s) does not render the density singular at that point which is equivalent to the statement that there are no 1/(1 − k 2 /m 2 B ) poles in the matrix elements.Let us emphasise that if the results carry statistical errors, such as in la ice Monte Carlo simulations then it might be advantageous to directly t F aux and F L and then regain F from the formula above.is ensures cancellation of the pole in that case.
Before stating the results of the computations let us turn to the issue of analytic continuation.We would like to employ our FFs in the Minkowski region k 2 > 0, whereas the OPE is convergent for k 2 < −Λ 2 .e convergence is broken by thresholds at k 2 = 4m 2 q which signal long-distance e ects corresponding to ρ/ω (φ)-like resonances cf.Table 4. e standard procedure is to analytically continue into the Minkowski region and use the FF for say k 2 > 4GeV 2 which is far enough from the lowest lying narrow resonances.For k 2 > 4GeV 2 the resonances are broad and disappear into the continuum.Under qq /k 2 and implicitly assumes k 2 = 0.In the k 2 → 0 limit these diagrams are replaced by the photon distribution amplitude e.g.[35].In the SR method the B-meson is projected out via a dispersion relation in the variable p 2 B giving access to the matrix element of the o -shell form factor F * (q 2 , k 2 ).e momentum assignment corresponds to the convention in the Appendix which di ers from the phenomenological discussion in the main text.
such circumstances local quark-hadron duality is usually assumed to be a reasonable approximation.In our region of use k 2 ∈ [(4.9GeV) 2 , m 2 B d,s ] there are no narrow resonances in the k 2 -channel. 15On a pragmatic level it is best to implement the V B→γ * ⊥ (q 2 , k 2 + i0)-prescription in the process of analytic continuation by deforming the path in (A.56) from where mBq ≡ m Bq /m b , s ≡ s/m 2 b , k2 ≡ k 2 /m 2 b and the perturbative densities are given by ρP * (s, 0, and the improvement discussed around (A.57) reads in the case at hand Further note that the 1/k 2 factor is of no concern since P * (0, k 2 ) ∝ k 2 .e logarithms L q and L b L q ≡ ln k2 lead to imaginary parts in the FFs for k 2 > 4m 2 q = 0 and k 2 > 4m 2 b respectively.From the viewpoint of the original function these singularities are anomalous thresholds which consist of pu ing all the propagators on-shell.ese expression are consistent with the B → V weak annihilation computation detailed in appendix of Ref. [31] cf.footnote 3 for further remarks.Note that, there is no singularity at k 2 = s when expanded properly.e condensate contributions could be wri en in terms of the densities ρ i as well.e backward substitution e −m 2 b /M 2 qq → qq δ(s − m 2 b ) achieves this task.A few comments on interpreting the results.e k 2 → 0 limit is not well-de ned for the condensates.In that limit the condensates originating from quark lines a ached to the photon are replaced by a photon distribution amplitude which makes the FFs computation more involved.However, the perturbative part remains well-de ned in that limit.Hence the la er must contribute positively to the {T ⊥, (0, 0), V ⊥, (0, 0)} by convention which can be veri ed indeed by using Q b = −1/3 and sending k2 → 0. e q 2 constraints (A.9,A.11)are obeyed exactly by the SRs and are assumed as we do not show VL (0, k 2 ) and T (0, k 2 ); they are simply redundant.e constraints at k 2 = m 2 Bq (A.26) are obeyed for the correlation functions with k 2 = p 2 B .However they do not hold exactly for the FFs as p 2 B ≈ m 2 Bq within the approximation of the Borel procedure.We have checked that these relations hold to within 2% where for the last one we compare to a value of the FF at q 2 = 10GeV 2 .In the ts we have implemented these constraints as they are important to cancels the poles present in the Lorentz structures R µρ and R µρ L .e expressions could be improved by m q = 0, adding the gluon condensate and radiative corrections.e rst two are expected to be rather small e ects since 1 m q /Λ QCD and m b qq G 2 .On the other hand, radiative corrections could be sizeable and would reduce the scale uncertainty considerably.is leads us naturally to the sum rule input and error discussion.

A.3.3 Numerical input to sum rules and rough uncertainty estimates
For a LO computation of the sum rule type discussed here the most important uncertainty will be due to the b quark mass and its associated scheme.Since the sum rule is e ectively a ratio of two sum rules, various uncertainties cancel.Some guidance can be taken from the NLO computation of the B-meson decay constant.For the la er it is well-known that NLO corrections are moderate in the MS-bar massscheme [64,65] it is therefore advisable to use that scheme anticipating the smallest scale uncertainty.e LO expression are f Bq , f Bs ≈ (222, 244)MeV with the MS-bar mass m b = 4.18(4)GeV [66].In fact the NLO corrections slightly reduce these values, unlike in other schemes, by less than 10%.Hence one can anticipate an uncertainty of that order.e other input are the condensates for which qq µ=2GeV = −(0.269(2)GeV) 3 (e.g.[66]) is well-known from the Gell-Mann Oakes Renner relation and the strange quark condensate is taken from a la ice computation ss µ=1GeV = 1.08(16) qq µ=1GeV [67].e SR speci c parameters, the Borel mass and the continuum threshold, are determined by a series of constraints.First and foremost we impose these constraints at Euclidean values k 2 < 0 as for positive k 2 the logarithms can lead to large cancellations and invalidate the prcedure.We then proceed by analytic continuation k 2 → −k 2 to obtain the values for k 2 > 0 as found in the results of this paper.is is generally believed to be a reasonable approximation as long as k 2 is not in a resonance region per se (smooth QCD curves).An example where this works well is e + e − → hadrons in the region above the broad regions as can be inferred from the R ∝ σ(e + e − → hadrons)/σ(e + e − → µ + µ − )-plots in [66].
e rst constraint comes from the formally exact relation, Bs = M 2 Bq + 0.5GeV 2 , which respect the mass ratios of the meson roughly.Let us turn to the uncertainty analysis.For the main section we use a z-expansion t with similar uncertainty analysis as in Ref. [29] with some more detail delegated to the t-section.
is analysis and t is restricted to high q 2 .In addition we append a Mathematica notebook to the arXiv version for reproducing the plots.e brief comments on uncertainties apply to this version.e main sources of uncertainty are the Borel parameter and the scale uncertainty.e threshold uncertainty is negligible and the quark condensate uncertainty is only relevant for the B s -mode where its impact is still moderate in almost all cases and regions.e uncertainty due to NLO corrections is estimated to be 15% on grounds of the known radiative corrections to the B-meson decay constant as previously discussed in this section.
e uncertainty of the Borel parameter is roughy 20%, which is on the conservative side, and when added in quadrature this amounts to a ≈ 25% uncertainty.An NLO computation would presumably not only reduce the scale but also the Borel uncertainty.We refrain from a more elaborate error analysis altogether including the extension in the resonance region.is is in principle not di cult to achieve in that one can at rst just vary the input into the dispersion relation.is procedure ought to give a reasonable error estimate.
A.4 Extending the o -shell form factors below the O(2GeV 2 )-region e QCD SR results (A.58) of the last section are valid for k 2 > O(2GeV 2 ).ey are not valid below, since the low-lying vector meson resonances such as the φ-meson, in the B s -case, distort the amplitude considerably.Conversely the 1/k 2 -factors of the condensates, which mimic these e ects in the region of validity become singular in this limit.On the other hand a fair amount is known on the FFs in that region, as previously discussed, cf.appendix A.2 and (A.52) in particular, including the on-shell B → γ FFs and the B s → φ FFs.We advocate that optimal use of this knowledge can be made by using a multiple subtracted dispersion relation with input of this hadronic data and the use of the OPE from the QCD SR for su ciently large k 2 .is is illustrated in Figure 5.
We rst begin by discussing the dispersion relation.We write the generic FF, denoted by F , as the sum of subtraction terms and the dispersion integral where the subscript "12" refers to a single subtraction point at k 2 0 and a double subtraction point at k 2  1 .e double subtraction ensures that even the derivatives is continous at the matching point.e subtraction term is given by where it was used that F is real somewhere on the real axis so that Schwartz's re ection principle applies.We emphasise that these expressions are exact as it is in essence based on partial fraction decomposition.
Let us turn to the speci c treatment proposed.e rst step is to write, in close analogy with (A.52), 16F dis 12 (k 2 ) → F dis-QHD with the di erence that there is no Borel transformation but subtraction terms instead.e superscript PT stands for perturbation theory whereby we mean the SR computation where the condensates are dropped.e quantity u 0 is an e ective threshold marking the onset of multiple particle states and the φ(1680)-meson (and the ρ /ω etc in the B d -case).e residue r φ F is speci c to the FF and can be inferred from the equations in appendix A.2. Finally we can make the outline sketched at the beginning of the section concrete17  where F hdis (k 2 ) ≡ F sub 12 (k 2 ) + F dis-QHD 12 (k 2 ), and the acronym "hdis" stands for hadronic dispersion relation (which is the correct approach when used in the region of discontinuities), and the OPE superscript corresponds to the QCD-SR expressions found in (A.58) which is based on the perturbative and the condensate contributions.In the following we refer to (A.70) as the improved FF.

A.4.1 Some detail on the double dispersion relation including the numerical input
So far we have been somewhat formal on how to obtain the concrete dispersion relations within our speci c computation.We rst give the recipe for the explicit integrals before stating the sources of numerical input.5GeV 2 which is again expected since the polynomial nature of the subtraction points will invalidate its use as well as the on-set of the condensate terms.On the right we show the imaginary part where by construction Im(P * OPE ) = Im(P * dis 12 ) and the improved version is identical above the second matching point.
We may parameterise the densities in (A.58) as follows ρ F = −q F L q − b F L b + rest, and then its discontinuity ρ D , which is formally a double discontinuity, is de ned and given by18   In three cases (the V ⊥, case are similar to T ⊥,L in that region) we plot the [0, 5]GeV 2 -region where the vector meson resonance is visible.For P * the reader can nd more plots in Figure 5.
For the di erential rate we nd with λ (V) Bq de ned above and λ γ ≡ λ(q 2 , m 2 , m 2 ) = q 2 (q 2 − 4m 2 ).e sensitivity study for searching for such light, avour-violating vectors at the tail of B q → µµ is analogous to the axion study presented in Section 3. As an illustration we show here the expected 90% CL exclusion limits with the full LHCb data set of 300 −1 for the di erent cases GeV .
e bounds are computed for the case of m V → 0 so contain only the leading in m V /m Bq term.

Figure 1 :
Figure 1:e diagram to the le is the main axion process B q → a whereas the two diagrams in the centre and the right belong to the B q → γ background.e single and double lines stand for the q and b-quark, respectively.e le and central diagrams depend on o -shell form factors in the sense that the photon that emits the two leptons is o -shell.Diagrams in which the photon couples to b-quarks are not shown, but are analogous.Also diagrams with Q 9,10 -operator insertions are not shown, and resemble the diagram on the right and are proportional to C 9 V ⊥ and C 10 V .

) 1 e
scalar form factor, γ * (k, ρ)|qb| Bq(pB) , vanishes due to parity conservation of QCD.2It is important to keep in mind that these form factors for the B-meson.E.g. if one assumes the phase convention C| B = |B , where C is the charge transformation, then the form factors, (V ⊥ , T ⊥ ) B→γ * = −(V ⊥ , T ⊥ ) B→γ * , change sign but and all others do not.e same holds true for a CP-transformation assuming CP | B = |B .e adaption of phases under the discrete transformation is straightforward.

Figure 3 :
Figure3: Projected sensitivity of LHCb to probe the avour-violating axion couplings F A bs ( lled red region) and F A bd (hatched region) as a function of the total integrated luminosity.Shown are the 90% CL exclusion limits assuming that the observed number of events will be the same as predicted in the SM hypothesis.
i = N SM i (as we do for the projection study) we nd for the statistics-only case |F A bs,90% | < 3.0•10 5 GeV and |F A bd,90% | < 4.0•10 5

A. 3 . 1
QCD sum rule for the o -shell form factors P

Figure 4 :
Figure 4: Figures for o -shell form factors V, T *⊥, ,L and P * .e single/double lines denote the q/b-quark respectively.e diagrams on the le and right are the perturbative-and the one on the le is the quark condensate-type.e quark condensate diagram corresponding to g a) is not shown.It is proportional to qq /k 2 and implicitly assumes k 2 = 0.In the k 2 → 0 limit these diagrams are replaced by the photon distribution amplitude e.g.[35].In the SR method the B-meson is projected out via a dispersion relation in the variable p 2 B giving access to the matrix element of the o -shell form factor F * (q 2 , k 2 ).e momentum assignment corresponds to the convention in the Appendix which di ers from the phenomenological discussion in the main text.

s 0 m 2 b 2 Bq /M 2 fm 2 Bq 2 bemBq qq e −m 2 b /M 2 
ds → γ ds, where γ is a path in the lower-half plane starting at m 2 b and ending at s 0 .One may for instance choose a semi-circle in the lower half-plane.is prescription leads to numerical stability.Clearly our computation remains valid and useful for D 0 → γ * FFs with replacements B q → D 0 and m b → m c .A.3.2 Explicit results for the o -shell form factors from QCD sum rules e explicit FFs are found to be P * (0, k 2 ) = e m Bq −s/M 2 ρ P * (s, 0, k 2 )ds − 2   + O(α s , m q ) , (A.58)

Figure 5 :
Figure 5: At the top le and right we plot the P * (k 2 ) FF as an example in the region between the rst and the second subtraction point k 2 0 = 0GeV 2 and k 2 1 = 5GeV 2 for the improved FF (P * ), the subtracted dispersion relation (P * dis 12 ) and the QCD-SR version (P * OPE ).As expected the QCD-SR version, which is based on the asymptotically valid OPE, does not satisfy the constraint P * (0) = 0. Other than at this point the QCD-SR and the dispersion relation version are rather similar where the improved version is very di erent of course because of the inclusion of φ-meson.On the bo om part we show the matching region on the le and a plot showing how the dispersion relation di ers from the QCD-SR version for k 25GeV 2 which is again expected since the polynomial nature of the subtraction points will invalidate its use as well as the on-set of the condensate terms.On the right we show the imaginary part where by construction Im(P * OPE ) = Im(P * dis 12 ) and the improved version is identical above the second matching point.

Figure 6 :
Figure 6: Plots of the ve FFs in the OPE region [5, 30]GeV 2 with formulae in (A.58) and denoted by F OPE .In three cases (the V ⊥, case are similar to T ⊥,L in that region) we plot the [0, 5]GeV 2 -region where the vector meson resonance is visible.For P * the reader can nd more plots in Figure5.

Table 2 :
[23]sults of recasting LHCb's analysis[23]to test avour-violating couplings of the axion to B s (F A bs ) and B d (F A bd 2 min and the corresponding best-t value for |F A bq |. e values of χ 2 min should be compared with the χ 2 value of the SM, χ 2 SM = 15.7.e axion best-t values are thus in roughly 1σ agreement with the SM.|F A bq,90% | are the resulting 90% CL exclusion limits.

Table 3 :
Projected 90% CL exclusion limits on the avour-violating couplings of the axion to B s (F A bs ) and B d (F A bd en referred to as the daughter sum rule, which we require to be satis ed within O(2%).Next s 0 ought to be in the window (m B + 2m π ) 2 and (m B + m ρ ) 2 which is compatible with (A.64).e Borel parameter is further constrained by requiring two standard criteria i) the condensate not to exceed 10% assuring convergence of the OPE ii) the continuum contribution not exceed 30% which assures that the B-meson is projected out rather than an entire set of states.Note that the later two conditions are exclusive in that the former requires a large and the la er a smaller Borel parameter.e values adapted for the thresholds are (s Borel parameter for P, T ⊥,L , V ⊥ is taken to be M 2 = 9(2)GeV 2 for |k 2 | < 10GeV 2 and M 2 = 7(2)GeV 2 for |k 2 | > 15GeV 2 with smooth interpolation and nally M 2 V = 6(2)GeV 2 .A global shi is applied for the B s Borel parameters M 2 2 Bq /M 2 ln P * (k 2 , M 2 , s 0 ) , (A.64) o