Flavoured axions in the tail of Bq → μ+μ− and B → γ* form factors

We discuss how LHC di-muon data collected to study Bq → μμ can be used to constrain light particles with flavour-violating couplings to b-quarks. Focussing on the case of a flavoured QCD axion, a, we compute the decay rates for Bq → μμa and the SM background process Bq → μμγ near the kinematic endpoint. These rates depend on non-perturbative Bq → γ(*) form factors with on- or off-shell photons. The off-shell form factors — relevant for generic searches for beyond-the-SM particles — are discussed in full generality and computed with QCD sum rules for the first time. This includes an extension to the low-lying resonance region using a multiple subtracted dispersion relation. With these results, we analyse available LHCb data to obtain the sensitivity on Bq → μμa at present and future runs. We find that the full LHCb dataset alone will allow to probe axion-coupling scales of the order of 106 GeV for both b → d and b → s transitions. As a spin-off application of the off-shell form factors we further analyse the case of light, Beyond the Standard Model, vectors.


Introduction and motivation
Open questions in particle physics and cosmology may well be addressed by very light particles that interact only feebly with the Standard Model (SM). The prime example is the QCD axion [1,2], which is not only predicted within the Peccei-Quinn (PQ) [3,4] solution to the strong CP Problem, but which can also explain the Dark Matter abundance if it is sufficiently lighter than the meV scale [5][6][7]. In the past years much activity has been devoted towards experimental searches for the QCD axion, and multiple proposals for new experiments are underway to complement ongoing efforts to discover the axion, see ref. [8] for a review. While most axion searches rely on axion couplings to photons, the axion also couples to SM fermions if they are charged under the PQ symmetry. Generically, these charges constitute new sources of flavour violation, which induce flavour-violating axion couplings to fermions, which can thus be probed by precision flavour experiments. For instance, this situation arises naturally when the PQ symmetry is identified with a flavour symmetry that shapes the hierarchical structure of the SM Yukawas [9][10][11][12], therefore, connecting the strong CP problem with the SM flavour puzzle. Even in the absence of such a connection, axion models with flavour non-universal PQ charges can be easily constructed and motivated by, e.g., stellar cooling anomalies that require suppressed axion couplings to nucleons [13][14][15].
In the absence of explicit models, the couplings of the axion to different flavours are a priori unrelated, and are parametrised by a model-independent effective Lagrangian for Goldstone bosons. The flavour-violating couplings in the various quark and lepton sectors can then be constrained by experimental data, see ref. [16] for a recent assessment of the relevant bounds in the quark sector using mainly hadron decays with missing energy. In this article we explore a novel direction to probe flavour-violating axion couplings involving b-quarks using the present and future LHC data collected to study B q → µµ.
We therefore focus on flavour-violating b → q transitions, which are described by the Lagrangian where F V /A bq are parity odd/even couplings, q = d, s and a denotes the derivatively coupled QCD axion, whose mass is inversely proportional to the axion decay constant, f a , which suppresses all axion couplings. The decay constant has to be much larger than the electroweak scale to sufficiently decouple the axion from the SM in order to satisfy experimental constraints [17,18]. This implies that the axion is light, with a mass much below an eV, and stable even on cosmological scales. Therefore, two-body B-meson decays with missing energy, which closely resemble the very rare SM decays with final-state neutrino pairs that have been looked for at B-factories, stringently constrain the couplings in eq. (1.1). The resulting constraints on the vector couplings F V bq (from B → K/πa decays) and the axial-vector couplings F A bq (from B → K * /ρa decays and B q mixing) have been given in refs. [16] (see also refs. [19,20]) and are summarised in table 1. Note that constraints from neutral meson mixing are typically JHEP09(2021)139 F V bq [GeV] F A bq [GeV] bd 1.2 · 10 8 (B → πa) 4.8 · 10 6 (B −B mixing) bs 3.1 · 10 8 (B → Ka) 1.3 · 10 8 (B → K * a) Table 1. Lower bounds on F V,A bq at 90% CL from B-decays and B q -mixing, taken from ref. [16]. much weaker than the ones from decays to vector mesons, except in the case of b → d transitions. This is mainly due to the lack of experimental data on B → ρνν suitable for the two-body recast.
In the present work, we investigate whether the couplings in eq. (1.1) can also be constrained at the LHC. To this end, we propose to use the three-body decays B s,d → µµa, where the muon pair originates from an off-shell photon, cf., figure 1 (left). With the main goal of measuring the SM decay B q → µµ, the ATLAS [21], CMS [22] and LHCb [23] collaborations have collected di-muon events with an invariant mass q 2 down to roughly (5 GeV) 2 . As long as no vetos on extra particles in the event are applied, these datasets can be used to constrain decays with additional particles in the final state, e.g., the radiative decay B q → µµγ, as proposed in ref. [24]. In this paper we focus on the LHCb potential although CMS and to some degree ATLAS are capable of a similar study. From the reduced mass resolution compared to LHCb, a reduced sensitivity can be inferred [25]. Whereas Belle II has an exciting physics program it is not competitive in very rare decays of B 0 s mesons [16,26]. Here, we point out that the same datasets can be used to constrain the decays B q → µµX, where X is a neutral, beyond-the-SM (BSM) particle with a mass that is sufficiently small to be kinematically allowed at the tail of B q → µµ, i.e., m X m Bq − 5 GeV ≈ 300 MeV. In this respect, the radiative decay B q → µµγ merely constitutes a SM background, which we take into account in our analysis. Note that the axion hypothesis (1.1) itself has a negligible effect on B q → µµγ and thus the SM prediction of that decay is considered. In particular, we suggest that when the measurement of B q → µµγ becomes feasible in the future, it can be directly interpreted in terms of constraining BSM particles that replace the final state photon. A similar strategy can be applied to s → d transitions, using for example the di-muon data collected at LHCb to study K S → µµ, cf., ref. [27], and possibly also to c → u transitions, i.e., D → µµ [28].
In the following we focus on the case of the invisible QCD axion, a, but our analysis can be readily extended to other particles appearing in the final state, as long as they are not vetoed in the event. In particular these could be heavy axions decaying within the detector, i.e., axion-like particles (ALPs). We expect such an analysis to be fully inclusive, that is, independent of the ALP decay mode. Similarly our proposal can be extended to constrain light vectors with flavour-violating couplings, e.g., dark photons or Z s. We explore these scenarios in appendix B. In this article we demonstrate the key elements of the analysis and perform the first sensitivity studies based on the published dataset of the LHCb collaboration. The ATLAS and CMS data can be analysed analogously.
The photon off-shell form factors are necessary for predicting branching fractions of B q → X where X is any of the above mentioned light BSM particles. We discuss the ℓ ℓ γ(k) Figure 1. The diagram to the left is the main axion process B q → a whereas the two diagrams in the centre and the right belong to the B q → γ background. The single and double lines stand for the q and b-quark, respectively. The left and central diagrams depend on off-shell form factors in the sense that the photon that emits the two leptons is off-shell. Diagrams in which the photon couples to b-quarks are not shown, but are analogous. Also diagrams with Q 9,10 -operator insertions are not shown, and resemble the diagram on the right and are proportional to C 9 V ⊥ and C 10 V . complete set of form factors, relevant for the dimension-six effective Hamiltonian, compute them with QCD sum rules (SRs) and fit them to a z-expansion. In addition the off-shell basis is shown to be related to the standard B → V = ρ 0 , ω, φ . . . basis through a dispersion representation, which interrelates many properties of these two sets of form factors.
This article is organised as follows: in section 2 we provide the differential rates for the axionic decay B q → µµa and the radiative decay B q → µµγ. In section 3 we provide the tools necessary to perform the analysis and use available background estimates and data from LHCb's B s → µµ measurement to evaluate the sensitivity to B q → µµa at present and future runs. We conclude in section 4. Appendix A contains the computation of the B → γ ( * ) form factors as well as their extension through a dispersion relation to the region of low lying vector mesons. In appendix B we provide the differential rate for B → γ * (→ + − )V and apply the proposed analysis.

Differential decay rates
In this section we calculate the differential rates for the axionic B q → a and radiative B q → γ decay channels. In figure 1, we show on the left the diagram for the axionic decay and in the centre and on the right representative diagrams for the radiative decay. The rates are differential in the lepton-pair momentum q ≡ p + + p − , and depend on non-perturbative B q → γ ( * ) form factors with on-or off-shell photons, which we briefly introduce before presenting the differential decay rates. Finally, we evaluate the rates close to the kinematic endpoint (4.9 GeV) 2 q 2 < m 2 Bq , and compare our prediction for the radiative decay to results in the literature.

Summary on B q → γ * form factors
We describe B q (p B ) → γ * (k) transitions with off-shell photons by a set of form factors with two arguments F * (q 2 , k 2 ) ≡ F B→γ * (q 2 , k 2 ). The first argument (here q 2 ) denotes the momentum transfer at the flavour-violating vertex while the second argument (here k 2 ) denotes the momentum of the photon. For on-shell photons, i.e. k 2 = 0, these form factors JHEP09(2021)139 reduce to the well-known on-shell form factors F (q 2 ) ≡ F * (q 2 , 0) given in eq. (A.22). A complete set of form factors is given by 1 denotes the momentum transfer at the flavourviolating vertex, and we define the off-shell photon state γ * (k, ρ)| as in eq. (A.2). The coefficients where G αβ ≡ g αβ − k α k β /k 2 and (kinematic) hatted quantities are divided by m 2 Bq , e.g.
The matrix element satisfy the QED and the axial Ward identities The latter implies the relation between the pseudoscalar and one of the axial form factors which reduces the number of independent form factors down to a total of seven. At q 2 = 0 there are two further constraints thereby reducing the form factors down to five. An extensive discussion including dispersion representations in the q 2 and k 2 variables, the derivation of eq. (2.5), the limit to photon on-shell form factors, and their computation from QCD SRs are deferred to appendix A. As an example let us quote the once-subtracted 1 The scalar form factor, γ * (k, ρ)|qb|Bq(pB) , vanishes due to parity conservation of QCD. It is important to keep in mind that these are theB → γ * form factors. E.g. if one assumes the phase convention C|B = |B , where C is the charge transformation, then the axial form factors (all but the ⊥-ones) change sign: (V ,L , T ,L , P ) B→γ * = −(V ,L , T ,L , P )B →γ * .

JHEP09(2021)139
dispersion representation for the pseudoscalar form factor for which simplifies for the subtraction point k 2 0 = 0. Above f em φ is the electromagnetic decay constant in the appendix A.2 and A 0 is the pseudoscalar B s → φ form factors [29]. In the last step the dispersion relation was evaluated for the lowest term and the dots stand for terms higher in the spectrum e.g. B s → φ and multiple particle configurations.
The off-shell form factors in the limit of small momentum transfer at the flavourviolating vertex, T * ⊥, ,L (0, q 2 ), V * ⊥, ,L (0, q 2 ) and P * (0, q 2 ) are computed in this work for the first time. 2 The B → γ * (→ + − )V differential rate, presented in appendix B, is another example application of these off-shell form factors for searches beyond the SM. The extension of the off-shell form factors to low k 2 into the resonance region, as shown in (2.6), requires the matching to the QCD SR result. The discussion thereof is delegated to appendix A.4 and involves, by choice, a multiple subtracted dispersion relation where additionally use is made of the on-shell form factors. Plots are shown in figure 6. A Mathematica notebook is attached to this paper as supplementary material for reproducing the form factors. This is for example relevant for the prediction of B → γ as the two form factors T * ⊥, (0, k 2 ), where k 2 is the dilepton mass, enter the description. These two form factors have also been considered in [32][33][34] using improved vector meson dominance.
For the on-shell form factors B → γ we use the next-leading-order (NLO) light-cone sum rule (LCSR) computation [35]. Note that the QCD SR result of the off-shell form factors can be used in the relevant kinematic region (4.9 GeV) 2 q 2 < m 2 Bq since thresholds are far away. The photon on-shell form factors are more challenging in this region because the light-cone expansion breaks down. They can, however, be extrapolated to this region by using a B * q and B q1 -pole ansatz, with the residue computed from LCSR [36], supplemented with z-expansion corrections to account for further states.

The B q → a differential rate
Given the effective Lagrangian in eq. (1.1), the amplitude forB q (p B ) The weak annihilation process, that is B → V γ * four-quark matrix elements where the B valence quarks annihilate, contain some of these form factors as sub processes. Weak annihilation has been computed in the SM to leading order (LO) in QCD factorisation [30] and including all BSM operators in LCSR [31]. However, the discussion in our paper is more complete as even the BSM computation in ref. [31] does not include all form factors since the V -mesons do not couple to scalar operators for instance. Moreover in ref. [32], the off-shell form factor T * ⊥ (0, k 2 ) = FT V (0, k 2 ) is evaluated using a vector-meson-dominance approximation. 3 Notice the interchanged role of k and q with respect to the definition of the form factors in eq. (2.1).

JHEP09(2021)139
where A µµa ≡ µµa| − L int |B q , q ≡ p B − k = p + + p − and Q = −1 denotes the lepton charge. After squaring this amplitude, summing over fermion spins, and integrating over the unobserved axion momentum, the differential rate in the invariant mass of the final-state leptons, q 2 , becomes , and λ(x, y, z) ≡ x 2 + y 2 + z 2 − 2xy − 2xz − 2yz is the Källèn function. For our work it is sufficient to approximate m a → 0.

The B q → γ differential rate
The relevant part of the effective SM Lagrangian 4 9,10 , and Q 7,9,10 is obtained from Q 7,9,10 by the replacements L → R and m b → m q . Upon using the helicity vector completeness relation (A. 19 Bq , q 2 , 0) and r q ≡ −is e e/2 λ can be factored into a leptonic L V,A λ = * ν (q, λ)ū(p − )γ ν [γ 5 ]v(p + ) and the hadronic helicity amplitude follows from the Hamiltonian with where the leptons are removed up to a normalisation factor. For the effective axial leptons one finds and for the vector ones By including the factor se in the definition of the operators Q7, Q 7 we ensured that the sign of their Wilson coefficients, C SM 7 < 0 is independent of the definition of the covariant derivative.

JHEP09(2021)139
where M T (5) µρ (q, k) ≡ M T (5) µρ (q, k) + M T (5) µρ (k, q) takes into account both types of diagrams in figure 1. Explicit parameterisations of the polarisation vectors and some more explanations can be found in appendix A.1.3. Using the (common) convention (2.14) The last equality relates T * (0, q 2 ) to T * ⊥ (0, q 2 ), see appendix A.1.6 and footnote 7 just before eq. (A.1). Above we omitted the contribution from photons radiated off final-state muons, because these are obtained from the B q → µµ rates using PHOTOS, cf., ref. [37]. Going slightly lower in q 2 would necessitate the inclusion of broad charmonium resonances [38,39]. For an overview of other non form-factor matrix elements see for instance refs. [32,38].
After integrating over the unobserved photon momentum, the differential rate for the radiative mode B q → γ reads (2.15) and c V ≡ (q 2 + 2m 2 ), c A ≡ (q 2 − 4m 2 ) are effectively the squared leptonic helicity amplitudes.

B q → µµa and B q → µµγ close to the kinematic endpoint
To illustrate the relative importance between the SM background B q → µµγ and the B q → µµa signal we take as a reference value for the flavour-violating coupling F A bq = 10 6 GeV. In figure 2, we show the differential rate normalised with respect to the two-body decay width In the left panel, we show the predictions for the B s decays and in the right the corresponding ones for the B d case. The binned (green) predictions are the B q → µµ rates including photon radiation from the final-state muons using PHOTOS (see ref. [37]). The red solid lines are the rates from the axion mode for the reference value F A bq = 10 6 GeV (note that the relative enhancement between left and the right panel is due to the normalization, which carries a different CKM suppression.). The black lines are the B q → µµγ predictions when the photon does not originate from muon bremsstrahlung. They depend on the treatment of the non-perturbative input, i.e., the hadronic form factors introduced in section 2.1. In all cases, we use the same perturbative input, namely the SM  For the axion predictions F A bq = 10 6 GeV is assumed as a reference value. The different black lines are the photon predictions with different form factor treatments (see legend and main text). In green are bins of the two-body B q → µµ rate including radiation from final-state muons. To better compare the B s and B d cases, all rates are normalised to their respective two-body decay B q → µµ, which is why the B d → µµa line appears enhanced with respect to the B s → µµa one.
Wilson coefficients C eff 7 , C eff 9 and C 10 evaluated at the hadronic B q scale. We obtain C 10 from ref. [37] and use flavio [40] to evaluate C eff 7 and C eff 9 . We show the results of three different approaches of estimating the relevant hadronic form factors: • Dashed line: the QCD SR form factor computation discussed in section 2.1 and appendix A, • Dotted line: the quark-model approach of ref. [32], • Dashed-dotted line: the pole-dominance approach supplemented by experimental data and heavy-quark effective theory of ref. [41]. It is specific to the B s case (left panel).
The agreement of the predictions is rather crude. For q 2 ≈ (4.9 GeV) 2 , our prediction is about a factor of three larger than the quark model [32] and about a factor of two smaller than the pole-dominance approximation [41]. The disagreement with the quark model is not surprising as the method is designed for low q 2 and, unlike in our work, no additional input is employed to constrain the residua of the leading poles near the kinematic endpoint. The agreement of the form factors themselves at lower q 2 , which we do not show, is much better. The comparison with the pole-dominance approach [41] has two major components. The difference in the B * q -residue and the fact that the effect of B q1 -resonance is neglected JHEP09(2021)139 in ref. [41] cf. appendix A.5.1. While it is important to understand 5 the origin of the discrepancy in light of a possible measurement of the radiative decay, the discrepancy does not play a significant role in obtaining a bound on the axion couplings F A bq , which we derive in the next section.

Sensitivity at LHCb
In this section we recast the LHCb analysis of ref. [23] to obtain an estimate for the current and future sensitivity of LHCb to probe the flavour-violating couplings F A bs and F A bd . We first discuss, in section 3.1, how we extract the backgrounds by rescaling the original LHCb analysis, and derive the expected number of events in each bin for a given luminosity. We then describe, in section 3.2, our statistical method and provide the recast of the present data and the sensitivity study for future runs. Our main results are summarised in tables 2 and 3.

Rescaling the LHCb analysis
The B s → µµ analysis of LHCb in ref. [23] makes use of datasets collected at different LHC runs, with luminosities L 7 = 1.0 fb −1 from 7 TeV, L 8 = 2.0 fb −1 from 8 TeV, and L 13 = 1.4 fb −1 from 13 TeV runs. Under the SM hypothesis, a total number of 62 B s → µµ events and 6.7 B d → µµ events are expected in this analysis in the full range of boosteddecision-trees (BDT) and the signal window (m µµ ∈ [5.2, 5.445] GeV). Since the BDT discrimination is flat one expects half of these events to pass the BDT > 0.5 selection. For this BDT selection, LHCb supplies a plot with backgrounds, which we use to extract their numerical values. By combining the expected number of B q → µµ events in the SM with the SM branching-fraction predictions, we extract a universal rescaling factor, r ≈ 0.079, via In these equations, i labels the √ s run and σ i is the corresponding b-quark production cross section in the acceptance of LHCb. The latter has been measured by LHCb for √ s = 7, 13 TeV, σ b,7 = 72 µb and σ b,13 = 144 µb [44]. For σ b, 8 we linearly rescale the 7 TeV value (σ b, 8 . f d and f s are the fragmentation ratios of b-quarks that are produced at LHCb and fragment into B d and B s , respectively. We absorb f d in the rescaling factor, r, and use the ratio f s /f d to obtain N Bs . This ratio has been measured by the LHCb collaboration to be f s /f d = 0.259 ± 0.015 [45]. Finally, summarises the experimental efficiencies and all other global rescaling factors, which we absorb into the definition of r.

JHEP09(2021)139
The quantities BR's in eq. (3.1) are the respective branching ratios in the signal window. This includes the effect of photon radiation from muons [37,46], which LHCb simulates with PHOTOS. The overline in the branching-ratio prediction indicates that the partial width is divided by the width of the heavy mass eigenstate (Γ H Bs , Γ H B d ) to obtain the branching fraction. In this way the effect of B q -mixing is included [37,47]. This is relevant for the B s system, but much less so for the B d system. This is numerically equivalent to LHCb's treatment of the effective lifetime, cf. eq. (1) in ref. [23].
LHCb's BDT > 0.5 selection covers the m µµ ∈ [4.9 GeV, m Bs ] region in bins of 50 MeV. We apply the same universal rescaling factor, r, to rescale the predictions of all B q → µµa and B q → µµγ branching fractions for all m µµ bins. This is a good approximation as there are no triggers or similar thresholds that significantly change the rescaling over this invariant-mass range. In the next section, we present the sensitivity of this analysis to probe the flavour-violating F A bs and F A bd axion couplings in future runs of LHCb by rescaling the 13 TeV dataset. We denote the corresponding effective total luminosity by At a given total luminosity, L, the expected number of events at a given m µµ -bin (Bin k ) then is

The quantity N BKG,analysis
Bin i is the expected total number of background events that do not originate from the radiative decay in the given bin. We obtain N BKG,analysis Bin i by digitising and integrating the plot of LHCb's BDT > 0.5 selection. In eq. (3.3) we kept separate the rate from photon emission from muons (B q → µµ(nγ)) and the rate from photon emissions from the initial state (B q → µµγ). In principle, the amplitudes interfere but the interference is tiny close to the B q threshold and we thus neglect it.

Recast and sensitivity analysis
To compute the sensitivity of the LHCb analysis in probing F A bs and F A bd , we must combine the information of all m µµ bins and include statistical and systematic uncertainties. We neglect the subdominant experimental systematic uncertainties but will include the theory uncertainties associated to the form factors entering the three-body rates. In what follows we always either turn on F A bs or F A bd , i.e., but will not let them float simultaneously. Each m µµ bin corresponds to an independent counting experiment that obeys Poisson statistics. Exclusion limits on F A bq are then obtained from a joined Poisson (Log)Likelihood. For a sufficiently large number of events, Poisson statistics are well described by Gaussian

JHEP09(2021)139
statistics and the Poisson (Log)Likelihood is equivalent to a χ 2 function of the NP parameter, i.e., F A bq : with i numbering the bins and q = s, d.
denotes the total number of events (background plus signal) for the value F A bq in a given bin, whereas N obs i is the observed number of events. For the recast we use the actual number of events observed by LHCb, read off from figure 1 in ref. [23]. To project the sensitivity for future LHCb runs we set N obs i to the number of events expected in the SM. The covariance matrix, V cov , incorporates statistical and systematic uncertainties in a way that we discuss below. If we neglect systematic uncertainties, this matrix is diagonal and only contains the squared Poisson variances, We have explicitly checked, that for the data samples considered here, the Poisson (Log-)Likelihood is always very well approximated by the χ 2 .
To incorporate systematic/theory uncertainties we follow the commonly used approach of ref. [48]. Theory uncertainties are then treated as Gaussian uncertainties smearing the expectation values of the underlying Poisson probability distribution functions. We can then obtain the limits on F A bq by generating Monte-Carlo events based on the joined Poisson likelihood after smearing the expectation values by the (correlated) systematic errors. If the measurement is well-described by Gaussian statistics (as in our case) and the systematic uncertainties are small with respect to the statistical ones, this treatment of uncertainties is equivalent to adding the statistical and systematic errors in quadrature in V cov .
In our case the main systematic uncertainties are due to the form factors that enter the radiative B q → µµγ and the B q → µµa rate. Since the uncertainties in the form factors originate in part from uncertainties in input parameters like m b and qq that are q 2 -independent, the predicted number of events among different bins are correlated. Therefore, the full covariance matrix for the case in which the axion has a coupling F A bq is not diagonal and decomposes into Here, (V stat ) ij = δ ij N i are the statistical uncertainties, while the matrices V γ , V q a , and V q a−γ describe the correlated errors among the predictions of various rates over the bins. Aside from trivial functional dependencies on global rescaling factors, e.g., luminosity, we can determine them once and for all by generating Monte-Carlo events in which we vary the parameters on which the form factors depend. In practice we use the mean values of the z-expansion fit (of degree four) and their covariance matrix (see appendix A.5.3) to determine each piece of V cov . Using the covariance matrices we obtain the 90% Confidence Level (CL) exclusion limit on |F A bq |, i.e. χ 2 (F A bq,90% ) − χ 2 min = 1.64. First, we recast the observed data of LHCb's analysis [23] in which L = L = 4.4 fb −1 . The measurement is dominated by statistical uncertainties, but for purposes of illustration we show both the bounds when combining statistical and systematic theory errors and the bounds when only the statistical uncertainty is included. In the χ 2 we include the first JHEP09(2021)139 50 100 150 200 250 300 . Projected sensitivity of LHCb to probe the flavour-violating axion couplings F A bs (filled red region) and F A bd (hatched region) as a function of the total integrated luminosity. Shown are the 90% CL exclusion limits assuming that the observed number of events will be the same as predicted in the SM hypothesis.
ten bins of the LHCb analysis. The observed data are in good agreement with the SM expectation. Indeed, we find that the χ 2 of the SM divided by the ten degrees of freedom of the The best-fit points for the axion lies roughly 1σ off the SM. In table 2 we list the best-fit points with their corresponding χ 2 min , as well as the resulting 90% CL exclusion limits on |F A bs | and |F A bd |. Next we make projections for future runs of LHCb. As discussed in section 3.1, to this end we rescale the 13 TeV events assuming LHCb will collect a total of 300 fb −1 . To compute the sensitivity we assume that LHCb will observe exactly the number of events expected from the SM. Therefore, the best-fit point always corresponds to observing zero events from axion decays and χ 2 min = 0. For the projection study we present the results both when only statistical uncertainties are considered and when they are folded with the correlated theory uncertainties. In figure 3 we show the resulting 90% CL exclusion limit on |F A bs | (left panel) and |F A bd | (right panel) as a function of the total luminosity. In addition, the limits for some indicative luminosities are listed in table 3.
Note that the limit from the actual recast is weaker than the expected limit under the background-only hypothesis. More precisely, if we consider the case L = 4.4 fb −1 and set N obs i = N SM i (as we do for the projection study) we find for the statistics-only case |F A bs,90% | < 3.0 · 10 5 GeV and |F A bd,90% | < 4.0 · 10 5 GeV. In comparison, the corresponding exclusion limits of the recast (table 3) are slightly weaker. The origin of this difference is mainly an excess of roughly 10 events in the first bin of the current LHCb B s → µµ analysis, which can be fitted by the best-fit point of an axion signal. However, as discussed in the recast the excess is not statistically significant and the best-fit point of the axion is within 1σ of the SM.   ). The analysis employs a total of 4.4 fb −1 of data from runs at 7, 8, and 13 TeV. In the columns labelled "sys+stat" we combine statistical and theory uncertainties, while in the columns labelled "stat only" we neglect the latter. We see that presently the bounds are dominated by statistical uncertainties. When computing the χ 2 we sum over the ten first bins of the analysis, i.e., m µ + µ − ∈ [4.9 GeV,   Table 3. Projected 90% CL exclusion limits on the flavour-violating couplings of the axion to B s (F A bs ) and B d (F A bd ) as a function of the integrated luminosity at LHCb. In the columns labelled "sys+stat" we combine statistical and theory uncertainties, while in the columns labelled "stat only" we neglect the latter.

Summary and outlook
In this article we have proposed a novel method to probe flavour-violating couplings of the QCD axion to b-quarks at the LHC, exploiting the di-muon datasets collected for the B q → µµ analyses. To this end, we have computed the relevant differential decay rates for the decay of a B q -meson to muons and an axion B q → µµa [eq. (2.8)] and the radiative decay B q → µµγ [eq. (2.15)], which is a background to the former process.
These rates depend on non-perturbative B q → γ ( * ) form factors, which we have discussed from a general viewpoint, computed with QCD sum rules (at zero flavour-violating momentum transfer). To the best of our knowledge this is the first discussion of the complete set of form factors, for the dimension-six effective Hamiltonian H b→(d,s) eff , supplemented with an explicit computation of all form factors in appendix A. The uncertainty of this leading order computation is estimated to be around 25% and could benefit from JHEP09(2021)139 radiative corrections which is, however, an elaborate task. We extend these form factors to the low-lying resonance region of unflavoured vector mesons by using a multiple subtracted dispersion relation cf. appendix A.4. A Mathematica notebook is appended to the arXiv version. Besides being useful for axion searches these form factors are also the ingredients for other light BSM particle (e.g. dark photon) searches. In addition, we have exposed the relation between the introduced basis and the standard B → V basis through the dispersion representation in appendix A.2, which interrelates form-factor properties of the two bases. A further application of the off-shell form factor formalism is to consider the charged case with q 2 , k 2 > 0 which is needed to describe B → + − ν. We postpone this task to a future study as this involves discriminating charges and dealing with contact terms but note that these decays has recently been described by different groups [49][50][51][52] using different approaches.
With these decay rates we performed a recast using available LHCb data and estimated the sensitivity to B q → µµa at present and future runs, taking into account the SM background B q → µµγ. We find that present data constrain the relevant axion couplings F A bd (F A bs ) to be larger than 3.0(2.4) · 10 5 GeV at 90% CL [table 2], while the full LHCb dataset will probe scales of the order of 10 6 GeV in both b → d and b → s transitions (table 3).
For stable axions, these results should be compared with the ones derived from Bmeson decays with missing energy. In the case of b → s transitions, the data from the BaBar collaboration on B → K * νν provide constraints that are roughly two orders of magnitude stronger than the ones from our LHCb recast of B s → µµa, cf. table 1. For the case of b → d transitions, the BaBar constraints are roughly of the same order than the ones that LHCb can obtain in upcoming runs. Nevertheless, the combination with the corresponding ATLAS and CMS analyses of B q → µµ may improve the bounds significantly. In appendix B we made further use of the off-shell form factors to estimate the strength of LHCb to search for light, Beyond the Standard Model, vectors.
While it is remarkable that the LHC can play a role in constraining couplings of the QCD axion, the analysis of B q → µµa that we have presented here can be relevant for other extensions of the SM with light neutral particles with flavour-violating couplings. Since the B q → µµa analysis is inclusive, it can be extended to search for light BSM particles even if they decay within the detector. For example, an ALP that decays promptly to, for instance, photons may be subject to cuts on additional photons in the analyses of B → K(a → γγ) at the B-factories and thus evade detection, while it would be kept in the B q → µµ(a → γγ) samples at the LHC. Therefore, the analysis that we have presented here complements axion searches in rare meson decays with missing energy at B-factories, and can play an important role in constraining flavour-violating couplings of light particles.

A The B q → γ * form factors
The standard B q → V matrix elements (ME), where V = ρ 0 , ω, φ . . . is a vector meson, hold some analogy with the B q → γ * ones. However, the difference is that the analogue of the vector-meson mass is the photon off-shell momentum which is a variable rather than a constant. Hence the MEs are functions of two variables and this leads to a more involved analytic structure. In this paper we restricted ourselves to the kinematic region q 2 ∈ [(4.9 GeV) 2 , m 2 Bq ], where the form factors (FFs) can be expected to dominate over long-distance contributions.
This appendix is structured as follows. Firstly, we define and state relation and limits of the FFs in section A.1, the link with the B → V basis is discussed in section A.2, the QCD SR computation of the off-shell FFs follows in section A.3 and finally we turn to the FF-parametrisation and fits in section A.5. Note, that sections A.1, A.2 and A.5 are independent of the method of computation.

A.1 Definition of B → γ ( * ) form factors
We introduce a complete set of off-shell FFs which is related to the standard B → V basis [29,54] via dispersion relations, cf. section A.2. On a technical level this appendix extends previous work [32,53], in that we discuss the full set of seven vector and tensor FFs and not only those needed for the SM transition. The complete basis is for example useful for other invisible particle searches such as the dark photon. The off-shell FFs are not to be confused with the on-shell FFs which have received more attention in the literature [32,33,53,55]. An overview of the on-and off-shell FFs used for this paper are shown in the diagrams in figure 1 and contrasted in table 4.

A.1.1 The complete basis of seven off-shell form factors F * (q 2 , k 2 )
We introduce the FFs with two momentum squares q 2 and k 2 collectively as F * (q 2 , k 2 ) ≡ F B→γ * (q 2 , k 2 ). The first argument (here q 2 ) denotes the momentum transfer at the flavourviolating vertex while the second argument (here k 2 ) denotes the momentum of the photon emitted at low energies.
We introduce a new off-shell basis via a dispersion representation based on the standard B → V basis [29,54]. Below we state the basis before turning to the construction in JHEP09(2021)139 Defined in eqs. (left) (centre) (right) Other notation - Table 4. Overview of FFs referencing definitions, graphs, and analytic structure. The latter defines the region of validity of the computation. Long-distance contributions are relevant in other kinematic regions [32,38]. For B d → γ * , m 2 φ is to be replaced by m 2 ρ,ω above.
section A.2. The absence of unphysical singularities in the matrix element enforces relations between FFs which we discuss in some detail. We will refer to this circumstance as "regularity" for short. The complete set of FFs were already introduced in the main text in eq. (2.1) and reproduced here for convenience 6,7 where j ρ = f Q ff γ ρ f is the electromagnetic current.

A.1.2 Constraints for off-shell form factors
The QED Ward identity holds off-shell in the form without contact term since the weak operator is neutral in the total electric charge. Note that eq. (A.6) is automatically satisfied in our parametrisation since k ρ R µρ ⊥, ,L,P = 0. The non-singlet axial Ward identity for M µρ A assumes the form which in turn holds without contact term since the electromagnetic current is invariant under non-singlet axial rotations. Eq. (A.7), upon using q µ R µρ ⊥, ,L = 0, implies that Regularity enforces constraints on the FFs defined in (A.1). 8 There are two constraints at q 2 = 0 and k 2 = m 2 Bq respectively. The Ward identity (A.8) enforces The second constraint is There are two further constraints due to the parametrisation of the form factors at k 2 = m 2  A similar constraint to (A.11) was reported in ref. [32] and we comment in the same section in what way it differs from ours.

A.1.3 The four photon on-shell form factors F (q 2 ) ≡ F * (q 2 , 0)
We next turn to the case where the low-energy photon is on-shell; k 2 = 0. We introduce the commonly used shorthand 0)). The basic physics idea is that the absence of the photon's zero helicity component implies the vanishing the pseudoscalar FF and the zero helicity part of the vector FFs. We may define the helicity amplitude for B → µµγ by One then obtains the two B → γµµ helicity amplitudes 15) and the ± direction obey 16) and the formulae for the vector V as well as the tensors T, T 5 are similar. The common definition explains the notation. In order to obtain these results it is useful to choose an explicit helicity vector parameterisations. E.g. in the B q -meson restframe (e.g. [eq. (15)] in [56]) where the velocity is given by v ≡ | k| = λ 1/2 (m 2 Bq , q 2 , k 2 )/(2m Bq ) and k → k µ is the Lorentz-index convention in the formulae above. This helicity basis is useful since with G λλ = diag(1, −1, −1, −1) and the first entry denotes the t-component. Second, in the limit k 2 → 0, * (k, 0) ∝ k, and this enforces,

JHEP09(2021)139
and reduces the seven FFs of eq. (A.1) to four. Alternatively one can infer the constraints (A.21) from the regularity of the matrix elements as k 2 → 0. The regularity condition and the helicity arguments are clearly related. For completeness we give the explicit k 2 → 0 basis [38,55] Relation to the standard B → V basis. We consider it worthwhile to comment on some aspects in the standard basis of B → V FFs e.g. [29]. The k 2 → 0 limit is then akin Such relations were noted previously. Firstly, in the B → V context in ref. [57] in appendix A and around eq. [5] in ref. [31], where it is argued that the relation has to hold in order to cancel a kinematic 1/m V -factor. Second for B → γ (m V = 0) they were previously reported in ref. [53] as a consequence of regularity.

A.1.4 The five form factors F * (0, k 2 ) at zero flavour-violating momentum transfer
In the process B q → X, with X a light BSM particle, the limit in which the flavourviolating momentum transfer goes to zero, i.e., q 2 = 0, corresponds to the case of zero or small mass of X. In this limit the two constraints in eqs. (A.9) and (A.11) reduce the number of independent FFs from seven to five. The matrix elements, at q 2 = 0,the become Table 5. The J P = 0 + FF vanishes by parity conservation of QCD. Generally, there are seven independent F * (q 2 , k 2 ) FFs (light-blue) with two constraintsV * L (0, k 2 ) = P * (0, k 2 ) (A.9) and T * (0, k 2 ) = (1 −k 2 )T * ⊥ (0, k 2 ) (A.11). For the photon on-shell case, F (q 2 ) ≡ F * (q 2 , 0), there are four independent FFs (light-red) and the reduction is due to the absence of the photon 0-helicity component. At zero flavour-violating momentum there are five independent FFs (light-green), due to the two constraints mentioned above. For the computation of the B → γ SM rate, the following five FFs are sufficient {V ⊥, (q 2 ), T ⊥, (q 2 ), T * ⊥ (0, k 2 )}.

A.1.5 Counting form factors
Since the last few sections were a bit involved with many steps we summarise the classification in table 5. In general there are seven FFs for the B → 1 − transition. In the photon onshell case this reduces to four because the photon comes with two polarisations only. In the case of zero flavour-violating momentum transfer the two general constraints (A.9), (A.11) reduce this number from seven to five.

A.1.6 Derivation of T
At last we turn to the derivation of the relation (A.11). We may choose to proceed by uncontracting the B → V matrix element, first in q ν (A.1) with shorthands x i = x i (q 2 , k 2 ), p = p B , η is the polarisation vector of a massive vector boson and square brackets denote antisymmetrisation in the respective indices. The corresponding uncontracted B → γ * matrix element then reads with c some i-independent kinematic function (X i = cx i ) which is irrelevant for our purposes. The appearance of the tensor G ρα can be understood from the viewpoint of a dispersion relation cf. section A.2 or and footnote 11. Regularity enforces at k·q ∝ 1−k 2 −q 2 → 0 , These two constraints are generally valid. We may make the connection with our basis by identifying There are two consequences of this equation. Since (0) in the notation of [29,54] and has been derived in the off-shell FF context in reference [53].

A.2 Relation of the B → γ * -and B → V -basis through the dispersion relation
In this appendix we make the link between the B → V -and the B → γ * -FFs through the dispersion relations. This is an instructive exercise and we will be able to recover properties of the B → γ * FFs from the B → V -ones. Our argumentation remains true if one considers any intermediate state (e.g. two pseudoscalar particles in a P -wave) as long as its quantum number, J P C = 1 −− , is equal to the one of the photon. This is the case since the properties follows from the general decomposition and the fact that any such state can be interpolated by the electromagnetic current in the LSZ formalism. In addition the dispersion representation may be useful for improving the fit ansatz of these FFs.

JHEP09(2021)139
For our purposes it is convenient to first write the B → V FFs [29,54] in the following form 10 where η is the vector-meson polarisation, P µ i are Lorentz vectors 11 where in (A.35) but not (A.36) k 2 = m 2 V is assumed, and with the more traditional FFs A B→V 0,1,2,3 (e.g. [29]) and V B→V is as follows respectively. The constraint (A.12) does not apply since m 2 V = m 2 Bq . As stated above the relation between the FFs becomes apparent in the dispersion representation (cf. the textbook [58] or the recent review [59]). A specific example is chosen for illustration, 12 such that the matrix elements are invariants under η → η + k for on-shell k 2 = m 2 V . This follows from the LSZ formalism. 12 In order to distinguish the various dispersion representations throughout this paper, we use the variables (s, t, u) for the momenta (p 2 B , q 2 , k 2 ).
In order to further illustrate we resort to the narrow width approximation (NWA) which can be improved by introducing a finite decay width or better multiparticle states of stable particles (cf. remark at beginning this section). With the NWA λ=−1,0,1 and the spectral or discontinuity function ρ T i (q 2 , u) assumes the simple form where the dots stand for higher states in the spectrum. The residua r V T i are then given by of the electromagnetic current and the vector meson. In particular (A.45) Rewriting our parametrisation (A.1), in compact form, and equating with (A.40) we are able to identify the two bases where ω Ji is, by construction, a matrix with diagonal entries only. Of course we could have chosen any other basis at the cost of having a non-diagonal ω-matrix but we feel that this is an economic way. Let us make the dispersion representation more concrete by clarifying what the subtraction terms mean in (A.40). Whether or not the B → γ * FFs does require a subtraction is practically not important since it is advantageous to do so anyway as the on-shell FF,

JHEP09(2021)139
which provides the subtraction information, is well-known and also physical. 13 One may write , (A.49) where J =⊥, L, P and i = 1, 3, P with ω Ji defined above and the same formula applies for The T, V -FFs are a bit more involved since the matching factor ω 2 ∝ (1 −k 2 ) contain a non-trivial k dependence. In order to avoid the artificial pole we may define an expression , (A. 50) which establishes regularity at k 2 = m 2 Bq and the corresponding subtracted dispersion relation reads The same applies again for V * with the substitutions T * → V * and ρ T 2 → ρV 2 . The analogy with (A.49) is restored if one divides the latter equation by ω Ji .
For the sake of clarity, we give a few examples of FFs in the k 2 -dispersion representation (A.49) which illustrates some of its properties: (A.52) 13 The asymptotic behaviour of the triangle function at LO, cf. figure 4 (left, center), is O(ln k 2 ) as can be inferred from the explicit expressions in (A.58). This continues to hold when resumming the leading-log expressions, cf. ref. [60], for the correlation functions in question (with Nc = 3). The dispersion relation in p 2 B and its restriction to the finite interval [m 2 b , s0] leads to a F LOPT ∝ 1/k 2 asymptotic behaviour. It is possible that this changes at NLO. However, there are condensate terms of order F LO qq ∝ O(1) as can be inferred from (A.58) originating from diagrams where the photon is emitted from the b-quark. It might be that at NLO this turns out to be O(ln k 2 ). Note that, reassuringly, in the limit q 2 , k 2 → −∞ all condensate terms vanish as one would expect since perturbation theory dominates in this regime. At least when investigating the mixed quark-gluon condensate term one can see that it is suppressed relative to the condensate terms suggesting that the OPE itself converges. In passing let us note that the difference to B → V or B → γ FFs is the off-shellness of the photon which does not require a double cut for the FF and can lead to a more divergent expression.

JHEP09(2021)139
where the minus sign is due to the minus sign in (A.43) which in turn comes from the electromagnetic interaction term. Above the k 2 + i0 prescription has been dropped for brevity and |c ρ 0 | 2 = |c ω | 2 = 2 has been used. These These relations can be turned around since they hold for any k 2 , they necessarily hold at each point of the spectrum and thus for the B → γ * properties imply the B → V FF properties.
Moreover, the examples reveal that the slope of the FF are positive which is the choice by convention. This is the case since r φ > 0 and r ρ > |r ω | > 0. At last let us note that a particularly convenient form for P * can be obtained if ones chooses the subtraction point k 2 0 = 0 where the pseudoscalar FF vanishes. This corresponds to (2.6) in the main text given as an illustration. *  (0, k 2 ), T * ⊥,L L L (0, k 2 ) and V * ⊥, (0, k 2 ) The FFs are computed using QCD SRs [61]. The starting point is the correlation function of the form

A.3.1 QCD sum rule for the off-shell form factors P
x,y are analytic functions in three variables and the Lorentz structures R µρ are defined in (A.3). Gauge invariance, again, holds in the simplest form k ρ Π V µρ (p B , q) = 0 since we work with electrically neutral states. The term C µρ (q 2 ) is a contact term but of no relevance for our purposes since they are p 2 Bindependent. It is the correction to the naive non-singlet axial Ward identity (A.7). The operator J Bq ≡ (m b + m q )biγ 5 q is the interpolating operator for the B q -meson with matrix element B q |J Bq |0 = m 2 Bq f Bq . The QCD SR is then obtained by evaluating (A.54) in the operator product expansion (OPE) (cf. figure 4) and equating it to the dispersion representation. The OPE consists of a perturbative part and a condensate part for which we include only the quark condensate. The OPE is convergent, in a pragmatic sense, for momenta p 2 B , q 2 < O(m b Λ) and k 2 < −Λ 2 with Λ ≈ 500 MeV a typical hadronic scale. The perturbative part is evaluated with the help of FeynCalc [62,63]. We neglect light-quark masses i.e. m d = m s = 0. is not shown. It is proportional to qq /k 2 and implicitly assumes k 2 = 0. In the k 2 → 0 limit these diagrams are replaced by the photon distribution amplitude e.g. [35]. In the SR method the B-meson is projected out via a dispersion relation in the variable p 2 B giving access to the matrix element of the off-shell form factor F * (q 2 , k 2 ). The momentum assignment corresponds to the convention in the appendix which differs from the phenomenological discussion in the main text.

The dispersion representation of Π
where the dots stand for higher resonances and multiparticle states. Moreover the NWA for the B-meson has been assumed. The FFs are then extracted via the standard procedures of Borel transformation and by approximating the "higher states" contribution by the perturbative integral [61]. The latter is exponentially suppressed due to the Borel transform in p 2 B . Note, that the contact term C µρ (q 2 ), which can appear as a subtraction constant in the dispersion relation, vanishes under the Borel transform. Above πρ V ⊥ (s, q 2 , k 2 ) = Im[Π V ⊥ (s, q 2 , k 2 )] and M 2 is the Borel mass. If we were able to compute ρ V ⊥ exactly then V ⊥ (q 2 ), obtained from (A.56), would be independent of the Borel mass and it therefore serves as a quality measure of the SR. Other FFs are obtained in exact analogy with the exception of F (F = V, T ) Let us turn to a technical point. Namely on how to avoid spurious kinematic singularities. The constraints (A.12) avoid these constraints and in the computation they are satisfied provided that s = m 2 Bq . However, since that is only satisfied within O(1%) in a SR some care needs to be taken. This procedure is equivalent to considering the FF combination in (2.1) proportional to 1/(1 −k 2 ) as a single FFs and thus avoids this spurious pole which is the correct treatment. More concretely, let us introduce F aux = 1 1−k 2 F +q 2 1−q 2 F L and then define F = (1 −k 2 )F aux −q 2 1−q 2 F L . This procedure is equivalent to defining the JHEP09(2021)139 improved density as follows: Crucially, the extra 1/(1 − k 2 /s) does not render the density singular at that point which is equivalent to the statement that there are no 1/(1 − k 2 /m 2 B ) poles in the matrix elements. Let us emphasise that if the results carry statistical errors, such as in lattice Monte Carlo simulations then it might be advantageous to directly fit F aux and F L and then regain F from the formula above. This ensures cancellation of the pole in that case.
Before stating the results of the computations let us turn to the issue of analytic continuation. We would like to employ our FFs in the Minkowski region k 2 > 0, whereas the OPE is convergent for k 2 < −Λ 2 . The convergence is broken by thresholds at k 2 = 4m 2 q which signal long-distance effects corresponding to ρ/ω (φ)-like resonances cf. table 4. The standard procedure is to analytically continue into the Minkowski region and use the FF for say k 2 > 4 GeV 2 which is far enough from the lowest lying narrow resonances. For k 2 > 4 GeV 2 the resonances are broad and disappear into the continuum. Under such circumstances local quark-hadron duality is usually assumed to be a reasonable approximation. In our region of use k 2 ∈ [(4.9 GeV) 2 , m 2 B d,s ] there are no narrow resonances in the k 2 -channel. 14 On a pragmatic level it is best to implement the V B→γ * ⊥ (q 2 , k 2 + i0)prescription in the process of analytic continuation by deforming the path in (A.56) from where γ is a path in the lower-half plane starting at m 2 b and ending at s 0 . One may for instance choose a semi-circle in the lower half-plane. This prescription leads to numerical stability. Clearly our computation remains valid and useful for D 0 → γ * FFs with replacements B q → D 0 and m b → m c .

A.3.2 Explicit results for the off-shell form factors from QCD sum rules
The explicit FFs are found to be 14 In fact one should be able to use these computations up to k 2 ≈ 4m 2 b − O(10 GeV 2 ) at least.

JHEP09(2021)139
wherem Bq ≡ m Bq /m b ,s ≡ s/m 2 b ,k 2 ≡ k 2 /m 2 b and the perturbative densities are given bŷ and the improvement discussed around (A.57) reads in the case at hand Further note that the 1/k 2 factor is of no concern since P * (0, lead to imaginary parts in the FFs for k 2 > 4m 2 q = 0 and k 2 > 4m 2 b respectively. From the viewpoint of the original function these singularities are anomalous thresholds which consist of putting all the propagators on-shell. These expression are consistent with the B → V weak annihilation computation detailed in appendix of ref. [31] cf. footnote 2 for further remarks. Note that, there is no singularity at k 2 = s when expanded properly. The condensate contributions could be written in terms of the densities ρ i as well. The backward substitution e −m 2 b /M 2 qq → qq δ(s − m 2 b ) achieves this task. A few comments on interpreting the results. The k 2 → 0 limit is not well-defined for the condensates. In that limit the condensates originating from quark lines attached to the photon are replaced by a photon distribution amplitude which makes the FFs computation more involved. However, the perturbative part remains well-defined in that limit. Hence the latter must contribute positively to the {T ⊥, (0, 0), V ⊥, (0, 0)} by convention which can be verified indeed by using Q b = −1/3 and sendingk 2 → 0. The q 2 constraints (A.9), (A.11) are obeyed exactly by the SRs and are assumed as we do not shoŵ V L (0, k 2 ) and T (0, k 2 ); they are simply redundant. The constraints at k 2 = m 2 Bq (A.26) are obeyed for the correlation functions with k 2 = p 2 B . However they do not hold exactly for the FFs as p 2 B ≈ m 2 Bq within the approximation of the Borel procedure. We have checked that these relations hold to within 2% where for the last one we compare to a value of the FF at q 2 = 10 GeV 2 . In the fits we have implemented these constraints as they are important to cancels the poles present in the Lorentz structures R µρ and R µρ L .

JHEP09(2021)139
The expressions could be improved by m q = 0, adding the gluon condensate and radiative corrections. The first two are expected to be rather small effects since 1 m q /Λ QCD and m b qq G 2 . On the other hand, radiative corrections could be sizeable and would reduce the scale uncertainty considerably. This leads us naturally to the sum rule input and error discussion.

A.3.3 Numerical input to sum rules and rough uncertainty estimates
For a LO computation of the sum rule type discussed here the most important uncertainty will be due to the b quark mass and its associated scheme. Since the sum rule is effectively a ratio of two sum rules, various uncertainties cancel. Some guidance can be taken from the NLO computation of the B-meson decay constant. For the latter it is well-known that NLO corrections are moderate in the MS-bar mass-scheme [64,65] it is therefore advisable to use that scheme anticipating the smallest scale uncertainty. The LO expression are f Bq , f Bs ≈ (222, 244) MeV with the MS-bar mass m b = 4.18(4) GeV [66]. In fact the NLO corrections slightly reduce these values, unlike in other schemes, by less than 10%. Hence one can anticipate an uncertainty of that order. The other input are the condensates for which qq µ=2 GeV = −(0.269(2) GeV) 3 (e.g. [66]) is well-known from the Gell-Mann Oakes Renner relation and the strange quark condensate is taken from a lattice computation ss µ=1 GeV = 1.08(16) qq µ=1 GeV [67]. The SR specific parameters, the Borel mass and the continuum threshold, are determined by a series of constraints. First and foremost we impose these constraints at Euclidean values k 2 < 0 as for positive k 2 the logarithms can lead to large cancellations and invalidate the prcedure. We then proceed by analytic continuation k 2 → −k 2 to obtain the values for k 2 > 0 as found in the results of this paper. This is generally believed to be a reasonable approximation as long as k 2 is not in a resonance region per se (smooth QCD curves). An example where this works well is e + e − → hadrons in the region above the broad regions as can be inferred from the R ∝ σ(e + e − → hadrons)/σ(e + e − → µ + µ − )-plots in [66].
The first constraint comes from the formally exact relation, often referred to as the daughter sum rule, which we require to be satisfied within O(2%). Next s 0 ought to be in the window (m B + 2m π ) 2 and (m B + m ρ ) 2 which is compatible with (A.64). The Borel parameter is further constrained by requiring two standard criteria i) the condensate not to exceed 10% assuring convergence of the OPE ii) the continuum contribution not exceed 30% which assures that the B-meson is projected out rather than an entire set of states. Note that the later two conditions are exclusive in that the former requires a large and the latter a smaller Borel parameter. The values adapted for the thresholds are (s is taken to be M 2 = 9(2) GeV 2 for |k 2 | < 10 GeV 2 and M 2 = 7(2) GeV 2 for |k 2 | > 15 GeV 2 with smooth interpolation and finally M 2 V = 6(2) GeV 2 . A global shift is applied for the B s Borel parameters M 2 Bs = M 2 Bq + 0.5 GeV 2 , which respect the mass ratios of the meson roughly.
Let us turn to the uncertainty analysis. For the main section we use a z-expansion fit with similar uncertainty analysis as in ref. [29] with some more detail delegated to the fit-section. This analysis and fit is restricted to high q 2 . In addition we append a Mathematica notebook to the arXiv version for reproducing the plots. The brief comments on uncertainties apply to this version. The main sources of uncertainty are the Borel parameter and the scale uncertainty. The threshold uncertainty is negligible and the quark condensate uncertainty is only relevant for the B s -mode where its impact is still moderate in almost all cases and regions. The uncertainty due to NLO corrections is estimated to be 15% on grounds of the known radiative corrections to the B-meson decay constant as previously discussed in this section. The uncertainty of the Borel parameter is roughy 20%, which is on the conservative side, and when added in quadrature this amounts to a ≈ 25% uncertainty. An NLO computation would presumably not only reduce the scale but also the Borel uncertainty. We refrain from a more elaborate error analysis altogether including the extension in the resonance region. This is in principle not difficult to achieve in that one can at first just vary the input into the dispersion relation. This procedure ought to give a reasonable error estimate.

A.4 Extending the off-shell form factors below the O(2 GeV 2 )-region
The QCD SR results (A.58) of the last section are valid for k 2 > O(2 GeV 2 ). They are not valid below, since the low-lying vector meson resonances such as the φ-meson, in the B scase, distort the amplitude considerably. Conversely the 1/k 2 -factors of the condensates, which mimic these effects in the region of validity become singular in this limit. On the other hand a fair amount is known on the FFs in that region, as previously discussed, cf. appendix A.2 and (A.52) in particular, including the on-shell B → γ FFs and the B s → φ FFs. We advocate that optimal use of this knowledge can be made by using a multiple subtracted dispersion relation with input of this hadronic data and the use of the OPE from the QCD SR for sufficiently large k 2 . This is illustrated in figure 5.
We first begin by discussing the dispersion relation. We write the generic FF, denoted by F , as the sum of subtraction terms and the dispersion integral where the subscript "12" refers to a single subtraction point at k 2 0 and a double subtraction point at k 2 1 . The double subtraction ensures that even the derivatives is continous at the matching point. The subtraction term is given by  2 5 GeV 2 which is again expected since the polynomial nature of the subtraction points will invalidate its use as well as the on-set of the condensate terms. On the right we show the imaginary part where by construction Im(P * OPE ) = Im(P * dis 12 ) and the improved version is identical above the second matching point.
where the prime denotes the derivative, U (a, b, c) ≡ a−b c−b , and the dispersion integral reads where it was used that F is real somewhere on the real axis so that Schwartz's reflection principle applies. We emphasise that these expressions are exact as it is in essence based on partial fraction decomposition.
Let us turn to the specific treatment proposed. The first step is to write, in close analogy with (A.52), 15 15 This time we write the finite width as we are interested in numerics and not only the formal relations between FFs. This presentation could be further improved by writing the finite φ-meson as resulting from a dispersion integral, the effects are though negligible as the width is rather narrow.

JHEP09(2021)139
where the acronym QHD stands for quark-hadron duality in the sense of semi-global quarkhadron duality, used in QCD SR, with the difference that there is no Borel transformation but subtraction terms instead. The superscript PT stands for perturbation theory whereby we mean the SR computation where the condensates are dropped. The quantity u 0 is an effective threshold marking the onset of multiple particle states and the φ(1680)-meson (and the ρ /ω etc. in the B d -case). The residue r φ F is specific to the FF and can be inferred from the equations in appendix A.2. Finally we can make the outline sketched at the beginning of the section concrete 16

12
(k 2 ), and the acronym "hdis" stands for hadronic dispersion relation (which is the correct approach when used in the region of discontinuities), and the OPE superscript corresponds to the QCD-SR expressions found in (A.58) which is based on the perturbative and the condensate contributions. In the following we refer to (A.70) as the improved FF.

A.4.1 Some detail on the double dispersion relation including the numerical input
So far we have been somewhat formal on how to obtain the concrete dispersion relations within our specific computation. We first give the recipe for the explicit integrals before stating the sources of numerical input. We may parameterise the densities in (A.58) as follows ρ F = −q F L q − b F L b + rest, and then its discontinuity ρ D , which is formally a double discontinuity, is defined and given by 17 where q 2 = 0 is not shown explicitly for brevity. The subscript q and b indicate whether the photon is emitted from a q = d, s or a b quark. The purely dispersive F dis 12 (A.67) reads 16 If one wanted to continue the representation into the euclidean region then one could do a further matching at say k 2 = −3 GeV 2 and use the OPE SR result for lower values. For the extension above 4m 2 b one would need to introduce the Υ-resonances. 17 As previously stated the condensates are not to be included. They enter in the asymptotic formula in the OPE-region and for the matching at k 2 1 of course.

JHEP09(2021)139
where p ≡ e Bq Q b is a common prefactor. We have verified numerically that these relations work for all five off-shell FFs. The improved version in the resonance region is then given by with the improvement term (A.75) where the emission from the b-quark has not support in the [0, u 0 ] interval since the cut only starts at 4m 2 b u 0 . This completes the formal recipe. However, for the numerical implementation, replacing (A.73) by the following expression is even better suited as it avoids an integral to infinity. Above F PT is the SR expression without condensates, [δF 12 ] u 0 as in (A.74) and δF sub 12 is given by with F sub 12 given in (A.66). The expression (A.76), which is formally equivalent to the former one, make the continuity of the matching condition up to the first derivative manifest, Above we used [δF 12 (k 2 1 )] u 0 = 0 and the prime denotes again the derivative. The improved version is absolutely necessary when considering a decay rate as then the resonance, cf. figure 5, is effectively squared and one will otherwise not get a correct expression. There is no quark-hadron duality at the level of exclusive decay rates! However, quark-hadron duality applies at the level of amplitudes since they obey dispersion relations. One can therefore evaluate them away from the resonance region. Within the resonance region, a reasonable approximation is obtained when averaged over a sufficiently large interval that does not end in a resonance region. Thus averaging the FF over [0, k 2 1 ] in dk 2 should provide reasonable agreement between the QCD-SR and improved FF. Indeed we find that in all cases this average well within a factor of two. This is remarkable when one inspects the shapes and considers the uncertainties of all the numerical input per se. We stress that it is the improved FF that is expected to give the reliable average and the difference to the QCD-SR result is another reason for introducing the improved FF.

JHEP09(2021)139
We now turn to the required numerical input. For F sub 12 (k 2 ) we just need the matching points. For the first subtraction point we choose k 2 0 = 0 since the FFs are known at this point from B → γ at zero momentum transfer in the weak matrix element. For the five FFs at hand they are given by  (14)) [35] are used. The second matching point is taken at k 2 1 = 5 GeV 2 which is far enough above u 0 = 1.7(2) GeV 2 for the B s -case (u 0 = 1.5(2) GeV 2 for the B d -case) as well as high enough to trust the OPE even when analytically continued to Minkowski space.
The term F dis-QHD

12
(k 2 ) in (A.68) consists of the pole term and the dispersion integral. The data entering the pole part is, of course, the same as in (A.52) and consists of the pole data: masses and decay widths as well as the residue which is a product of the decay constant and the B → φ[ρ, ω] FFs at zero momentum transfer. The former are determined in large from experiment and taken from the analysis in appendix C in [29]. The FFs are taken from the same reference which consists of a NLO LCSR analysis.

A.4.2 Plots of the off-shell form factors
Plots of the five FFs are shown in figure 6 in the non-resonant region and three examples in the resonant region. Whereas all of these FFs asymptotically tend to a constant due to the condensate term, as discussed in footnote 13, T * ⊥,L and V ⊥ tend towards zero since there is a cancellation for k 2 ≈ m 2 b . The imaginary part drops to zero at around k 2 = s 0 − m 2 b as can be inferred from (A.71). Both effects will presumably be lifted at NLO. The partonic computation acquires an imaginary part at k 2 > 4m 2 b which is however far to the left of the plot-window and thus not visible. In the hadronic region the single resonance is used together with a continuum threshold. The later results in a ln(u 0 −k 2 ) and is representative of two particle thresholds and effects from higher resonances and the φ . Obviously in the vicinity of u 0 the result is to be used in the sense of bins only.

A.5 Dispersion relation and fit ansatz for form factors
A.5.1 Extending the B → γ on-shell form factors into the q 2 ≈ m 2 B -region The B → γ on-shell FFs F (q 2 ) ≡ F * (q 2 , 0), cf. (A.22), taken from the NLO LCSR analysis [35]. The region of validity of the computation is the previously mentioned q 2 < m b Λ which is just outside our region of interest q 2 ∈ [(4.9 GeV) 2 , m 2 B d,s ]. Progress can be made with the help of the generally valid dispersion representation in the flavour-violating momentum transfer q 2 JHEP09(2021)139    Table 7. Lowest resonance masses for FFs [66]. The mass m 0 − is the B-meson mass in B q → γ.
The Form Factors V, T ⊥ and V, T are associated with the resonances J P = 1 − and 1 + respectively.
where the dots stand for higher resonances and multiparticle states. The values and quantum numbers of the resonances are collected in table 7. The dispersion relations of the other FFs are analogous. The residua are related to the B * q → B q γ and B q1 → B q γ on-shell matrix elements respectively. Unfortunately they are not known from experiment. 18 They can be extracted from the same SR as the FFs themselves by applying a double dispersion relation to interpolate for the B * q -and B q1 -meson respectively. We take the NLO result of this residue from [36], collected in table 6.
Here, we make the link to the predictions of ref. [41], for which a single m B * s -pole approximation was employed to estimate the FFs for the radiative decay. The single-pole approximation is expected to give a reasonable approximation around the pole provided the residue is known sufficiently well. By identifying the defining matrix elements of the residue (cf. eq. (6) in ref. [41]) we find the relation with f Bs = 227MeV [66] the standard decay constant and |µ| defines the strength of the on-shell matrix element in ref. [41]. The authors of ref. [41] determine |µ| = 1.13 GeV −1 in an effective-theory approach valid at leading order in 1/m b,c using experimental data from D * + → D + γ and D * 0 → D + π − . They neglect the pole of the B s1 meson (cf. table 7) and thus we cannot compare the |r V | residua to theirs. Given the methods employed on both sides the agreement of 0.235 (23) and 0.265 is presumably somewhat accidental. Whereas the former is LO in the coupling with preliminary error analysis, the latter is subject to 1/m c corrections which might well be sizeable. At last let us mention that we performed a non-trivial test of the identification in eq. (A.81). Approximating our FF-expression to the pole part, inserting it into our rate in eq. (2.15), and then comparing to the rate in ref. [41] (cf. their eq. (25)), we can confirm JHEP09(2021)139 that eq. (A.81) is consistent with both rates. This is a strong hint of the correctness of the treatment in our work and theirs.

A.5.2 The dispersion representation of the B → γ * off-shell form factors
The assumed q 2 = 0 is well below the various m 2 B -type poles and does not affect the computation. However, in the variable k 2 there are the previously mentioned ρ/ω (φ)-and Υ-resonances (cf. table 7) which are far away from our region of interest k 2 ≈ m 2 B and therefore have little impact. If one wanted to fit the FFs at lower k 2 then a dispersion ansatz, e.g. (A.52), could be combined with the z-expansion.

A.5.3 Fit ansatz and z-expansion
The procedure to fit the FFs and how to include the correlation of uncertainties largely follows ref. [29]. Based on the previous part of this section let us first motivate the fitansatz before summarising the essence of the z-expansion. There are four on-shell FFs and at q 2 = 0 there are five off-shell FFs,  bosons (V) with flavour violating couplings. In this appendix, we present the B q → V rate, which is relevant for a search via B q → µµ, and briefly discuss the search's sensitivity. We parametrise the relevant flavour violating couplings via the effective Lagrangian where Λ is the heavy NP scale. Above, we opted to factorise out the term m 2 V /Λ 2 in the dimension-four vector-and axial-vector couplings to demonstrate the "restoration" of gauge invariance in the limit m V → 0. We further introduce the quantitieŝ which simplify some of the intermediate formulae.
Based on eq. (B.1) we compute the B q → V rate using the helicity formalism. The B q → V decay differs from the semileptonic and flavour-changing-neutral-current decays B → ρ ν and B → K * in that the vector meson is emitted from the weak vertex and the lepton pair originates from an off-shell photon cf. figure 7. The helicity amplitude for the decay reads where g T bq ≡ m Bq /Λ T bq and analogous for T 5 as this leads to transparent formulae. Explicit polarisation vectors are given in eq. (A.18). The helicity amplitudes are then given by with λ (V) Bq = λ(m 2 B , q 2 , m 2 V ) and the FFs The minus sign in the last expression originates from the minus sign in eq. (2.1). Our FFs provide a good description in the case where m V is much smaller than all other scales since we have evaluated them for m V = 0. Three observations on the helicity amplitudes are in order. Firstly, in the m V → 0 limit the pseudoscalar part in the longitudinal component survives since V * L (m 2 V , q 2 ) = −2m 2 Bq /m 2 VV * (m 2 V , q 2 ) in general and V * (0, q 2 ) = P * (0, q 2 ) in particular. Secondly, in the m V → 0 limit the part of the amplitude proportional toĝ T bq happens to be identical to the part proportional toĝ observed that at the kinematic endpoint q 2 → (m B − m V ) 2 one has A 0 = A + = A − (where √ 2A ⊥( ) = A + ∓ A − ) as a result of the restoration spherical symmetry [68].
For the differential rate we find Bq defined above and λ γ ≡ λ(q 2 , m 2 , m 2 ) = q 2 (q 2 − 4m 2 ). The sensitivity study for searching for such light, flavour-violating vectors at the tail of B q → µµ is analogous to the axion study presented in section 3. As an illustration we show here the expected 90% CL exclusion limits with the full LHCb data set of 300 fb −1 for the different cases