The quark flavor-violating ALPs in light of B mesons and hadron colliders

The axion-like particle (ALP) may induce flavor-changing neutral currents (FCNCs) when their Peccei-Quinn charges are not generation universal. The search for flavor-violating ALP couplings with a bottom quark so far focused on FCNC processes of $B$ mesons at low energies. The recent measurements of $B\to K +X$ rare decays place stringent bounds on the quark flavor violations of a light ALP in different decay modes. In this work we propose a novel direct search for bottom flavor-violating interaction of a heavy ALP at the LHC and its upgrades, namely QCD production of an ALP associated with one $b$ jet and one light jet $p~p\to b~j~a$. We consider the decay of the ALP to photons, muons and invisible ALP decays. The Boosted Decision Tree (BDT) algorithm is used to analyze the events and we train the BDT classifier by feeding in the kinematic observables of signal and backgrounds. Finally, we show the complementarity between the search prospects of hadron colliders and the low-energy $B$ meson constraints from $B$ meson mixing and $B$ meson decays to a light ALP.

The most promising search for the quark flavor-violating couplings of ALP is through the FCNC processes of mesons at low energies, such as the meson decays or the mixing of neutral mesons.Recently, the first measurement of the rare decay B + → K + ν ν at Belle II [53] draws much attention in the community.The combination of the inclusive and hadronic tagging results gives the branching ratio as BR(B + → K + ν ν) Belle II = (2.3 ± 0.7) × 10 −5 . ( This result is 2.7 standard deviations above the SM expectation [54,55].It provides an excellent opportunity to confine the flavor-violating couplings to bottom quark as well as the invisible decay mode of ALP, together with the previous searches [56,57].See Ref. [58] for a recent ALP interpretation of Belle II measurement and Refs.[59][60][61][62] for earlier considerations of B meson decays to an ALP.Moreover, the LHCb, Belle II and BaBar also searched for the rare decays B → K ( * ) µ + µ − [63][64][65] as well as B + → K + γγ [66].They can also place experimental bounds on the visible decay modes of ALP and its b quark flavor violation.These bounds are considerable for m a ≲ m B .The suppression of heavier ALP mediator with m a ≳ m B would however forbid the on-shell two-body decay and weaken these low-energy constraints.High-energy colliders have been the primary tool for the direct probe of new physics (NP) beyond the SM in the past decades.The collider experiments would efficiently reveal the mass and interactions of the new particles at the energy frontier.After the CERN Large Hadron Collider (LHC), the luminosity upgrade of the LHC (HL-LHC) will take the lead in searching for new physics beyond the SM [67].There also have been considerations to construct the next generation of hadron colliders with 100 TeV center-of-mass (c.m.) energy (FCC-hh) [68,69].They provide an ideal environment to search for heavy ALP with quark flavor-violating interactions.The very recent studies of ALP quark FCNC at colliders focus on the flavor interactions of an ALP with a top quark [70][71][72][73][74].In this paper, we will investigate the potential probe of ALP flavor-violating couplings to the bottom quark at hadron colliders and the complementarity with the low-energy B meson constraints.We simulate the QCD production of ALP with one b jet and one light jet j p p → b j a , through either the FCNC b-d-a coupling or b-s-a coupling.The ALP is then considered to be shortlived and decay invisibly or decay into dimuon and diphoton final states.The popular Boosted Decision Tree (BDT) algorithm is applied to analyze the events and we train the BDT classifier by feeding in the kinematic observables of signal and SM background events.Finally, we compare the sensitivity of LHC, HL-LHC and FCC-hh to the b quark flavor-violating couplings of ALP with the constraints from B meson measurements.The rest of this paper is organized as follows.In Sec. 2, we first describe the theoretical framework for the ALPs interactions with the SM fermions and emphasize the quark flavor-violating couplings.We also present the low-energy constraints on the flavor-violating couplings to b quark from leptonic and semi-leptonic B meson decays, and B meson oscillations.In Sec. 3, we analyze the search for b quark FCNC of ALP associated with invisible decay products at hadron colliders.The exclusion limits at colliders are shown in comparision with the low-energy B meson constraints.The numerical analyses for the dimuon and diphoton decay modes of ALP are given in Sec. 4 and Sec. 5, respectively.We summarize our results in Sec. 6.

General Fermionic Interactions for ALPs 2.1 Theoretical formulation
We introduce a generic massive CP-odd scalar a, presumably an ALP associated with a global U(1) symmetry spontaneously broken above the electroweak scale.Besides the kinetic term for ALP, the most general effective Lagrangian for fermionic ALP interactions is given by [48,75,76] with complex (c V,A ij ) * = c V,A ji for i ̸ = j.For on-shell fermions, after applying the equations of motion, the above Lagrangian becomes Thus, the ALP fermion couplings are proportional to the linear combinations of fermion masses.The flavor-conserving currents are only induced by pseudo-scalar bilinear and c A ii coefficient.The ALP generally has flavor-violating couplings from either scalar bilinear or pseudo-scalar bilinear.If they are not present at tree level, they are induced via electroweak interactions at the one-loop level.This in particular results in flavor-violating ALP decays into lighter fermions a → f i fj + fi f j for m a > m i + m j is when m j = 0 , The ALP lifetime sensitively depends on its mass m a and decay constant f a .When m a ≫ m b and assuming only down-type quark couplings exist, the above partial width leads to the ALP lifetime as ( The collider searches for heavy ALP with b jets will be in the regime of prompt ALP decay.For light ALP with m a < m b , we take the decay to muons as an example with (2.5) The ALPs in B meson decays are usually long-lived due to the smaller ALP mass and the stronger limit on the decay constant.The inclusion of additional decay modes would make the ALP more short-lived.In summary, for m a ≫ m b in the LHC collider search region, the current bound on f a still allows for large parameter space for a prompt decayed ALP, and thus our focus for the prompt decay case in the collider analysis later.For m a < m b , bounds on f a from B physics pushes such light ALPs, if decaying mostly back to the SM particles, to be long-lived.
There are several constraints on ALPs from meson decays and oscillations.We follow the discussion in Refs.[48,51] and focus on B physics which constrains the ALP couplings c V,A bq .The B physics provides complementary constraints to collider searches with b jets.
The flavor-violating effective operators in the weak effective theory (WET) can be generated through the exchange of W boson at loop-level [77,78].They are thus suppressed by the Fermi constant G F as well as CKM matrix, for instance G F V * ts V tb (s L γ µ b L ) q (qγ µ q).If the ALP flavorconserving coupling is present and the ALP radiates from one of the quark legs, the b-s-a flavorviolating coupling can be generated by closing the loop of the q quark.This kind of ALP flavorviolating coupling is constructed by vertices of two effective field theories which are dimension-six four-fermion operators in WET and dimension-five ALP flavor-conserving coupling.Refs.[79,80] embedded the ALP chiral theory into the above weak effective Lagrangian for the calculation of K → πa decay, through the ALP-meson mixing, ALP emission of a meson or the ALP coupling to the four-fermion vertices (the first five Feynman diagrams of Fig. 1 in [79]).They suffer from a further suppression by f 2 π G F compared to the off-diagonal ALP couplings to quarks at tree-level (the sixth diagram in their Fig. 1).
As a phenomenological analysis, we remain agnostic about the origin of the ALP flavor couplings to SM quarks.The ALP flavor-violating couplings were simply taken as independent parameters in an effective framework.We refer the reader to existing UV models which realize large flavor-violating couplings.One example is the astrophobic axion proposed in Ref. [81].The assignment of family dependent PQ charges in DFSZ-like models suppresses the diagonal couplings of the first family SM fermions and implies flavor violating axion couplings.
Other examples are given by models identifying the Peccei-Quinn symmetry U (1) PQ with a global Froggatt-Nielsen U (1) flavor symmetry in order to solve the flavor hierarchy of SM fermions, see e.g.Refs.[39,40,43,82].This flavor symmetry enforces a Yukawa matrix structure where the diagonal elements (1, 1) and (2, 2) are suppressed compared to the (3, 3) and offdiagonal elements.For example Ref. [43] realizes at the nearest-neighbour texture where only offdiagonal elements and the (3, 3) element are generated when truncating at dimension-7.The other Yukawa matrix elements are further suppressed.The families of SM quarks gain non-universal PQ charges and sizable off-diagonal ALP couplings.The "axiflavon" couplings to the SM fermions are proportional to the sum of flavor-dependent charges of SU (2) L doublet and singlet, see e.g.Eq. ( 8) in [40].The couplings are in general not diagonal in the fermion mass eigenstate basis but have flavor changing neutral currents.The U (2) axiflavon in Ref. [83] gains flavor violating couplings in vector currents which are more dominant than the diagonal elements.

Leptonic B meson decays
Leptonic B meson decays are sensitive probes for ALPs due to their chiral suppression in the SM [48].The branching ratio can be expressed in terms of the SM branching ratio [48] where Bq , α is the fine structure constant, and f Bq is the meson decay constant.In particular, the decays to muons are constrained to be [85][86][87][88] BR(B d → µ + µ − ) = (0.6 ± 0.7) × 10 −10 (< 1.6 × 10 −10 at 90% CL) , (2.7) (2.8) These limits apply to the product of coefficients c A * bq c A µµ /f 2 a if m a > m Bq or if the ALP decays promptly and thus do not constrain the same ALP coupling combination as the collider searches.
As the ALP couplings to fermions are proportional to the fermion mass, its contribution to the pseudoscalar meson decay to neutrinos is suppressed.But, any additional invisible decay width of the ALP will be constrained by invisible B q decay.For instance, Refs.[89,90] proposed the ALP portal to freeze-in dark matter.BaBar obtained an upper limit on invisible B d decays [91] BR(B d → inv) ≤ 2.4 × 10 −5 (2.9) at 90% CL and recently, the authors of Ref. [92] derived the first upper limit on the invisible B s decays using LEP data (2.10) at 90% CL.For a 2-body decay into an invisible fermion-antifermion pair with mass m f and thus Bq , the invisible B q decay width can be expressed in terms of the ALP decay width Γ(a → f f ) as The constraint on the invisible B q branching ratio can be translated in a lower bound on the axion decay constant which results in relatively weak constraints on the axion decay constant f a / |c A bq c A f f | ≳ 10 3 GeV.

Other B meson decays with an ALP in the final state
We are considering three main scenarios depending on the dominant ALP decay mode [48] (i) B → M a(→ invisible), (ii) B → M a(→ ℓ + ℓ − ), and (iii) B → M a(→ γγ) with M = K ( * ) , π, ρ.The calculation for each of the scenarios depends on the ALP mass.For a heavy ALP with m a ≳ m B , the ALP cannot be produced on-shell and its contribution to the semi-leptonic B meson decay can be described within effective field theory (EFT).For lighter ALPs with m a < m B − m M , the ALP can be produced on-shell and the decay rates are where q = s for P = K, V = K * , and q = d for P = π, V = ρ.For π 0 , ρ 0 ∼ (uū there is an additional overall factor of 1/2.

Invisible decays
The branching ratio for semi-invisible decays receives two contributions where r det the size of the detector and BR(a → f f , γγ) denotes the branching ratio for decays to fermion-antifermion pairs and two photons.We focus on the scenario where the ALP decays dominantly invisibly and thus the first term dominates the branching ratio and it is approximately given by the product BR(B → P/V + inv.)NP = BR(B → P/V + a)BR(a → inv.) . (2.16) Apart from the recent Belle II measurement of B + → K + + inv [53], there are currently only upper limits.Subtracting the SM contributions, Ref. [93] derived upper limits on non-interfering new physics contributions, which are reproduced in Table 1.We find the constraint on the quantity c In Sec. 3, we will show the preferred region or upper limits on c by B → π/ρ + inv. in details.
The Belle-II analysis also places constraints on B → K ( * ) e + e − which is however not relevant for the collider study in Sec. 4. See also the recent phenomenological study of the Belle II sensitivity [62].
The most stringent constraint is placed by the LHCb search for B + → K + a(→ µ + µ − ) with a displaced vertex [64] which presents constraints on the branching ratio BR(B + → K + a)BR(a → . The coloured regions are excluded.The LHCb B + → K + a(→ µ + µ − ) analysis is not sensitive for masses close to the K 0 S , J/ψ, ψ(2S) and ψ(3770) meson masses, which is indicated by gray shaded regions.We do not colour the partially vetoed mass range close to the ϕ and ψ(4160) resonances.Similarly the B 0 → K * 0 a(→ µ + µ − ) search looses sensitivity close to the J/ψ, ψ(2S) and ψ(3770) meson resonances.
µ + µ − ) as a function of the mass m a and lifetime τ a .Assuming short enough decay lengths of ALP, we recast the experimental limit on a bound on the combination |c a function of the axion mass m a .The colored regions in Fig. 1 are excluded at 95% CL.The blank regions correspond to the vetoed K 0 S , J/ψ, ϕ and ψ meson resonances.We find the constraint on the ALP coupling |c This is stronger by two orders of magnitude than the above constraint from invisible decay.Note the search looses its sensitivity for several masses due to hadronic resonances.Similarly, the LHCb search for B 0 → K * 0 a(→ µ + µ − ) [63] provides a constraint on the axial-vector coupling c A bs .

Decays to photons
BaBar searched for ALPs decaying to a pair of photons.We recast the constraints in figures 3 and 4 of Ref. [66] as a constraint on the product of the ALP coupling |c V bs |/f a and the square root of the branching ratio to two photons and present the constraints in Fig. 2. While the BaBar analysis [66] considered the mass range 0.175 GeV < m a < m B + − m K + for the prompt decay analysis, the search for long-lived ALPs was restricted to m a < 2.5 GeV.The mass ranges close to the pion, η and η ′ masses have been excluded due to large peaking backgrounds.The 90% CL limit on the coupling product |c V bs |/f a BR(a → γγ) turns out to be 5 × 10 −10 ∼ 10 −9 GeV −1 for promptly decaying ALPs.

B meson oscillations
At leading order in the heavy b quark mass, B meson mixing is described in terms of dimension-6 operators in heavy quark effective theory.The effective Hamiltonian describing B meson mixing with the dimension-6 operators together with the operators Õ1,2,3 which are obtained by flipping the chirality L ↔ R in O 1,2,3 .In the SM only the Wilson coefficient C 1 is induced.The ALP induces in addition the three Wilson coefficients [48] C at tree level at the ALP mass scale µ a ≃ m a .They are related to the Wilson coefficients at the hadronic scale (2.20) The parameter η which describes the running of C 1 is given by (2.21) assuming a heavy ALP with m a > m t .The Wilson coefficients C 3 and C3 are induced by renormalization group running, but suppressed compared to C 2 and C2 , respectively.They are included in the numerical analysis, but are not included in the analytic expression below.
The mass difference can be expressed in terms of the Wilson coefficients at the hadronic scale µ b [48] where the first term denotes the SM contribution with λ q t ≡ V * tq V tb 1 , C i and Ci describe the ALP contributions to the different Wilson coefficients, f Bq is the decay constant of B q , m Bq the B q meson mass, B Bq are hadronic parameters taken from [97] and reproduced in Table 2 (bottom) and the η q i (µ b ) normalization factors are defined as [96] η The pre-factor of coefficient C 4 is positive and twice the negative of the pre-factor of C 2 + C2 and thus we find it to dominate the meson mass difference.The SM prediction and the experimental measurements of the mass differences for B d and B s meson mixing are presented in Table 2 (top).
Note that both observables have sizable theoretical errors, larger than the experimental errors.In fact the experimental errors are negligible compared to the theoretical errors when combining them in quadrature.Moreover, there are discrepancies between the SM predictions of different groups.
The SM predictions reported by FLAG [98,99] are larger and thus deviate further from the experimental measurements.The larger deviation could be explained by an ALP.See [48] for a discussion how it affects the results.In this work, we take a conservative approach and do not attempt to explain any deviation, because the main focus of the work is on the sensitivity of current and future colliders to flavor-violating ALP scenarios.We thus only show the results for the SM prediction obtained in [97].For B s − Bs mixing, the SM contribution in the first term of Eq. (2.22) is real and negative, since the imaginary part of λ s t is small.As the central value of SM prediction ∆M SM s is larger than the experimental measurement, a positive ALP contribution is preferred.For B d − Bd mixing, λ d t and thus the SM contribution in the first term of Eq. (2.22) is complex with similar magnitudes of the real and imaginary parts.In terms of the ALP couplings c V,A bq /f a , the mass differences are 1 Compared with Ref. [48], we additionally include the dependence on the CKM matrix elements through λ q t to recover the phase information in the off-diagonal element of the mass matrix.In addition, instead of their k d and kD couplings, we used a different basis with 17.7656 ± 0.0057 Bs [GeV 2 ] 0.0441 ± 0.0017 0.0454 ± 0.0027 0.0544 ± 0.0019 Table 2. Top: SM prediction (based on weighted average) [97] and experimental measurements [100,101] for the mass differences.Bottom: Hadronic parameters Bq for the relevant operators reproduced from [97].
for m a = 100 GeV.As mass differences are proportional to the square of the ALP couplings, the result is independent of the sign of the ALP couplings.Note, the coefficients of the axial-vector couplings are roughly twice as large as the ones for the vector couplings and enter with the opposite sign.This results in a cancellation of the ALP contribution to the meson mass splitting if the vector and axial-vector ALP couplings satisfy In Fig. 3 we present the 1σ (green) and 2σ (yellow) preferred regions as a function of the positive real and imaginary parts of the ALP couplings c V,A bq /f a for an ALP mass m a = 100 GeV.The ALP couplings which are not explicitly shown have been set to zero.Negating one of the ALP couplings does not change the constraints.Although we only show the results for a fixed ALP mass m a = 100 GeV, it is straightforward to rescale the results to other ALP masses neglecting renormalization group effects, since the Wilson coefficients C 2 , C2 and C 4 are proportional to (m a f a ) −2 .Fig. 3 (top row) illustrates the dependence of the meson mass differences ∆M q on the vector ALP couplings c V bq /f a .For fixed real part, the imaginary part of the ALP coupling is bounded from above by Im(c V bd(bs) ) ≲ 8(22) × 10 −5 f a /GeV.For fixed small imaginary part, there are two disconnected preferred regions due to the destructive interference between the SM and NP contributions.This relaxes the upper bounds to Re(c V bd(bs) ) ≲ 1(6) × 10 −4 f a /GeV.The result for the axial-vector couplings is similar.Due to the relative minus sign in the expressions for the meson mass differences, the real and imaginary axis are swapped and the larger prefactor results in constraints which are more stringent by a factor ∼ 1.57 (1.66) The middle and lower panels of Fig. 3 show the strong correlation between the real and imaginary ALP couplings in the planes of Re(c A bq )/f a vs. Re(c V bq )/f a (middle) and Im(c A bq )/f a vs. Re(c V bq )/f a (bottom).The middle panel illustrates the possible cancellation between the real parts of the vector and axial-vector couplings for Re(c V bd ) ≃ 1.57Re(c A bd ) and Re(c V bs ) ≃ 1.66 Re(c A bs ).While real axial-vector ALP couplings constructively interfere with the SM contribution, there is destructive interference between the real vector ALP couplings and the SM.The parameter space for imaginary ALP couplings has similar features as argued above.The lower panel shows the region of parameter space, where vector and axial-vector couplings constructively interfere with each other.
While we agree with the order of magnitude for the constraints, the results in [48] differ due to the missing phase information for the SM contribution in [48] and an additional ∼ 1.4 suppression of the NP contribution in the numerical expression for ∆M q using m a = 10 GeV. 2

FCNC search for invisibly decaying ALP at colliders
We first consider the production of ALP associated with one b jet and one light jet j p p → b( b) j a . (3.1) The representative Feynman diagrams of the process p p → b j a are shown in Fig. 4.Here we only assume b − d − a coupling for illustration and the charge conjugate diagrams are not displayed for simplicity.We define a "bare" cross section independent of the b-q-a FCNC couplings as follows (3. 2) The parameter-independent cross section σ 0 (bja) as a function of m a with √ s = 13 TeV (blue), 14 TeV (green) or 100 TeV (purple) is shown in Fig. 5, after applying the following parton-level cuts (dashed lines).The small difference is due to the different parton distribution functions (PDFs) for d and s partons in Fig. 4 (3) - (12).Suppose the flavor-conserving couplings ∂ µ aqγ µ γ 5 q or ∂ µ a bγ µ γ 5 b exist, there would appear fake events containing b j a by mis-identifying the quark jets.
Here, we assume that the production processes are only induced by bottom quark flavor-violating couplings of ALP.
Suppose the tree-level ALP flavor-conserving and flavor-violating couplings are of the same order of magnitude, as discussed in Sec.2.1, the flavor-violating coupling generated via the fourfermion SM operators is more suppressed.Moreover, we expect that the production q q → bsa via the effective operators has PDF suppression compared to our gluon fusion processes.Thus, the production via the SM effective operators and ALP flavor-conserving couplings can be ignored in our analysis.
We assume an invisible decay mode of the ALP.The invisibly decaying ALP induces a large missing transverse energy E T (MET).We propose the search for invisible ALP associated with viable particles.The major SM backgrounds are thus with the Z boson's decay into neutrinos and the mis-identification of one jet j → b or b → j.
We also include reducible backgrounds jjW ± , b bW ± and t t with vetoed charged lepton from W 2 The numerical pre-factors can be straightforwardly calculated from the renormalization group equations, the normalization constants η q i (µ b ) and the running hadronic parameters Bq (µ b )mB q .Renormalization group corrections enhance the pre-factors of C2 (C4) by 1.153 (1.274) for ma = 10 GeV and the hadronic parameters are given in Table 2.  boson's leptonic decay.Their K factors are 1.6 (2.5) for b bZ (b bW ± ) [102], 1.3 (2.3) for jjZ (jjW ± ) [103] and 1.8 for t t [104].The model file of ALP with FCNC couplings is produced by FeynRules [105] and is interfaced with MadGraph5 aMC@NLO [106] to generate signal events.Both the signal and background events are then passed to Pythia 8 [107] and Delphes 3 [108] for parton shower and detector simulation, respectively.The default Delphes cards for ATLAS, HL-LHC or FCC-hh are used for b-tagging efficiency.We pre-select the events with at least two jets and one of them tagged as b jet satisfying The jets are reconstructed using the anti-kT algorithm with R = 0.4 in FastJet [109].The jjW ± , b bW ± and t t backgrounds can be further reduced by vetoing the charged lepton with After pre-selection and considering the K factors, the cross sections of different SM backgrounds at  pb and σ t t = 2.14 pb, respectively.The application of decision trees in multivariate analysis is a highly effective and increasingly popular method to distinguish the signal and background events.It is expected to perform better than the traditional cut analysis in discrimination.We employ the Boosted Decision Tree (BDT) algorithm [110] implemented in XGBoost [111] to analyze the events passing the above pre-selection cuts.The kinematic observables used to train the BDT classifier are as follows • transverse momentum, pseudo-rapidity and azimuthal angle of the final b jet and light jet j: p T (b), p T (j), η(b), η(j), ϕ(b), ϕ(j) ; • the difference of the pseudo-rapidity, azimuthal angle and the separation in angular space between the b jet and light jet: ∆η(b, j), ∆ϕ(b, j), ∆R(b, j) ; • the invariant mass of the b jet and light jet m bj ; • the missing transverse energy E T ; • the sum of the transverse momenta of final visible objects H T = p T (b) + p T (j) .
Then, we take the probability that the BDT algorithm classifies an event as signal as the BDT response score.The BDT response score distributions of the signal (red) and total SM background  /f a = 1 TeV −1 in Fig. 6.We obtain the BDT cut by maximizing the following significance [112] where s and b are the signal and background event expectations, respectively.The obtained BDT cut, cut efficiencies of signal and backgrounds and the significance are collected in Table 3 for the above benchmarks.We put the results for HL-LHC and FCC-hh in Appendix A.1.The BDT cut becomes more severe as the ALP mass increases except for low masses m a ≃ 10 GeV.In Fig. 7, for m a ≳ 5 GeV, we show the 2σ exclusion limits on |c with q = d (left) and q = s (right) at LHC (blue lines), HL-LHC (orange lines) and FCC-hh (purple lines).Here we assume either |c V bq | or |c A bq | is present in order to compare with the lowenergy B decay constraints for m a ≲ 5 GeV.The limits from LHC and HL-LHC are at the level of 10 −4 GeV −1 and 10 −3 GeV −1 for m a = 10 GeV and m a = 100 GeV, respectively.The FCC-hh can push the limits lower by 3 -6 times.According to Sec. 2.3.1, when m a ≲ 5 GeV, the preferred region or upper limits are also shown for c ) by B → π/ρ+inv.(right).We assume that the mass of the invisible particle is smaller than m a /2.The upper limit from B + → π + +inv. is shown as black solid curve, as black dashed curve, B 0 → K * 0 + inv.or B 0 → ρ 0 + inv.as red dashed curve.The 2σ region preferred by B + → K + + inv. is given by the gray band.These low-energy constraints are at least four orders of magnitude more stringent than the collider bounds.Moreover, we also compare with the B − B mixing constraints on c bq /f a for heavy ALP (green band and dashed line as the boundary of preferred region).Here, we only show the favored regions for the benchmark choices of c V bd = e iπ/8 and c V bs = 1, which maximize the destructive interference with the SM.It turns out that the FCNC search for invisibly decaying ALP at hadron colliders is not able to reach the parameter space preferred by the B meson oscillations for these benchmarks unless there is a cancellation between the vector and axial-vector ALP couplings to high precision.
When the machine learning is applied in the research of theoretical particle physics, it is often thought to be a black-box algorithm due to the lack of interpretability.However, the Shapley value can play as an effective quantity to better understand the application of machine learning in the phenomenological study of particle physics [113].As a well-known concept in cooperative game theory, the Shapley values [114] were introduced by Shapley in the 1950s to solve the problem of fairly allocating the payoffs to each player in a n-player cooperative game based on their respective contributions.In a cooperative game characterized by (v, N ) with n players, N = {1, • • • , n} is a set of players in the game.T as a subset of N refers to a coalition and the largest coalition is N , i.e., T ⊆ N .v is the characteristic function that maps each coalition to a real number.v(T ) refers to the payoff of this coalition and v(T ∪ {i}) − v(T ) is the marginal contribution of the i-th player to the coalition T not containing i.The Shapley value after considering all possible subsets T not containing i, i.e. the payoff of the i-th player, is defined as [115,116] where |T | is the cardinality of coalition T .For this work, the Shapley values can serve as a tool to easily understand which of the kinematic observables are more important for the separation of the signal from the SM background.For a single event with n kinematic variables, there are n associated Shapley values S v .A positive S v indicates that the event is more likely to belong to a certain channel, while a negative value means that the event is less likely to belong to this channel.We compute the average of the absolute Shapley values |S v | of all events for each kinematic observable and show ten of them with the highest Shapley values for benchmark masses m a = 10 (left panels) and 1000 GeV (right panels) at LHC in Fig. 8.For the invisibly decaying ALPs (top panels), as expected, the missing transverse energy (MET) with the highest Shapley value is the most important observable to discriminate the signal and background events.The discrimination power of MET is dominant for large m a around and above 1000 GeV, for which the MET value is also large for the signal.For smaller m a around 10 GeV and relatively moderate MET value, additional observable such as p T (b) also helps to distinguish ALP from SM background such as jj(bb)Z.For the decay channels a → µ + µ − (middle panels) and a → γγ (bottom panels) to be analyzed in Sec. 4 and Sec. 5, due to the narrow resonance of ALP, it is evident that the invariant mass of the final muon m µµ or photon pair m γγ far surpasses the importance of other observable in discrimination power.With larger m a though, the nevertheless model-dependent total width of the ALP also grows, and secondary observable such as p T (µ, γ) further helps to distinguish ALP from SM background.The result of Shapley values is consistent with our expectations, indicating the importance of kinematic observables in each channel.Finally, we comment on the validity of effective ALP theory.The approximation of EFT is valid for f a > m a /4π.The collider limits are compatible with EFT validity.The larger couplings c bq /f a will violate the perturbative unitarity.Then, the EFT expansion breaks down and cannot represent a reliable description of an underlying theory.Moreover, the EFT validity at colliders further requires that the decay constant should be larger than the partonic c.m. energy of the subprocesses in the signal events, i.e., f a > √ ŝ.However, one is not able to directly measure √ ŝ at hadron colliders.We can approximately take the invariant mass of final visible states as √ ŝ, for instance M inv.= M bjµµ or M bjγγ for the two jets and the visible decay products of ALP.
Then, the c.m. energy scale highly depends on the ALP mass and increases as the mass gets larger.Although it is not practical to impose the cut f a > M inv.for the signal events, we should note that this cut effect would become weaker for larger f a and the collider limits would be relaxed for very heavy ALP mass.For a roughly quantitative illustration, as shown in the collider exclusion figures, we consider a horizontal line at 10 −3 GeV −1 representing the typical reaction scale for LHC and HL-LHC, and 10 −4 GeV −1 for FCC-hh.One can see that a majority of the collider limits in both dimuon and diphoton channels fulfill the EFT validity.

FCNC search for ALP decaying to dimuon at colliders
We suppose that the ALP promptly decays into a muon-antimuon pair, i.e., a → µ + µ − .We pre-select the events containing at least two muons and two jets with one of them tagged as b jet satisfying We also set the maximal missing transverse energy as The major SM backgrounds include with W boson or top quark's leptonic decay.The K factors for jjW W and bbW W are 1.19 [117] and 1.12 [118], respectively.After pre-selection and considering the K factors, the cross sections of different SM backgrounds at LHC with √ s = 13 TeV are σ jjZ/γ = 1.47 pb, σ bbZ/γ = 0.45 pb, σ jjW W = 5.02 × 10 −4 pb, σ bbW = 0.16 pb and σ t t = 0.24 pb, respectively.After including the final muons from axion decay, the kinematic observables considered to train the BDT classifier are as follows • transverse momentum, pseudo-rapidity and azimuthal angle of the final b jet, light jet j and muons µ : p T (b), p T (j), p T (µ), η(b), η(j), η(µ), ϕ(b), ϕ(j), ϕ(µ) ; • the difference of the pseudo-rapidity, azimuthal angle and the separation in angular space between each two of the four final states ∆η ij , ∆ϕ ij , ∆R ij , where i, j = b, j, µ 1 , µ 2 (i ̸ = j) ; • the invariant mass of the b jet and light jet m bj , and the two final muons m µµ ; • the missing transverse energy E T ; • the sum of the transverse momenta of final visible objects Since the final muons are from different decay processes in the signal and SM backgrounds, the kinematic observables associated with muons allow for a significant distinction between them.We also show the distributions of BDT response score of the signal (green) and total SM background (brown) with c.m. energy √ s = 13 TeV and luminosity L = 300 fb −1 for benchmark masses m a = 10, 50, 300 and 1000 GeV, and c /f a = 1 TeV −1 in Fig. 9.We can see that the signal and backgrounds are better separated than the invisible channel, due to the reconstructable ALP resonance.Through maximizing the significance in Eq. (3.7), we can obtain the BDT cut.
The information related to the signal for the above benchmarks and backgrounds are provided in Table 4.We put the results for HL-LHC and FCC-hh in Appendix A.2.It turns out that in this case one can achieve a good distinction between the signal and the backgrounds.The remaining fraction of the signal events is more than 95%.The BDT cut efficiency of backgrounds is at the level of 10 −4 ∼ 10 −3 .The BDT cut is universal for all benchmark masses.
To better demonstrate the distinction between the signal and background, we show the invariant mass distribution of final muon pair in the signal after applying BDT cut in Fig. 10.The 2σ exclusion limits on |c V (A) bq |/f a BR(a → µ + µ − ) are displayed for m a ≳ 5 GeV in Fig. 11, with q = d (left) and q = s (right) at LHC, HL-LHC and FCC-hh.The bounds are more stringent than those from a → inv.channel by one order of magnitude.As mentioned in Sec.2.3.2,we also show the low-energy constraints on |c at LHCb.We take the limits by assuming short lifetime of ALP (cτ a = 1 mm).They are more severe than the collider bounds by at least five orders of magnitude.Under the assumption of BR(a → µ + µ − ) = 1, the FCC-hh can probe the parameter space of both c V bd /f a and c V bs /f a favored by the B meson oscillation for the two benchmarks.The LHC and HL-LHC are able to reach the favored parameter space of c V bs /f a for m a ≲ 1 TeV.

FCNC search for ALP decaying to diphoton at colliders
We assume that the ALP promptly decays into diphoton.We pre-select the events containing at least two photons and two jets with one of them tagged as b jet satisfying The major SM backgrounds include p p → jjγγ , bbγγ . (5. 2) The K factors for jjγγ and bbγγ are 1.3 [119,120] and 1.36 [121], respectively.After pre-selection and considering the K factors, the cross sections of different SM backgrounds at LHC with √ s = 13 TeV are σ jjγγ = 1.26 pb and σ bbγγ = 0.06 pb.
For the kinematic observables, which are considered to train the BDT classifier, are as follows • transverse momentum, pseudo-rapidity and azimuthal angle of the final b jet, light jet j and photons γ : p T (b), p T (j), p T (γ), η(b), η(j), η(γ), ϕ(b), ϕ(j), ϕ(γ) ; • the difference of the pseudo-rapidity, azimuthal angle and the separation in angular space between each two of the four final states ∆η ij , ∆ϕ ij , ∆R ij , where i, j = b, j, γ 1 , γ 2 (i ̸ = j) ; • the invariant mass of the b jet and light jet m bj , and the two final photons m γγ ; • the missing transverse energy E T ; • the sum of the transverse momenta of final visible objects H T = p T (b) + p T (j) + p T (γ 1 ) + p T (γ 2 ) .
In Fig. 12, we show the distributions of BDT response score of the signal (yellow) and total SM background (purple) for LHC with √ s = 13 TeV and luminosity L = 300 fb −1 .The signal and backgrounds are also well separated, similar to the dimuon channel.The BDT cut and cut efficiency for LHC with √ s = 13 TeV are collected in Table 5.The results for HL-LHC and FCC-hh are put in Appendix A.3.The BDT cut 0.9 is also universal for all ALP mass benchmarks and colliders.Similar to the a → µ + µ − channel, the invariant mass of diphoton in the signal can be well reconstructed after applying the BDT cut as shown in Fig. 13.The 2σ exclusion limits on |c |/f a BR(a → γγ) are displayed for m a ≳ 5 GeV in Fig. 14, with q = d (left) and q = s (right) at LHC, HL-LHC and FCC-hh.These limits are close to those of a → µ + µ − channel.The low-energy constraints are also shown for comparison.The low-energy bound from B + → K + a(→ γγ) is more severe than the collider bounds by at least four orders of magnitude.Under the assumption of BR(a → γγ) = 1, only the FCC-hh (all LHC, HL-LHC and FCC-hh) can reach the parameter space of both c V bd /f a (c

Conclusions
In this paper, we studied the bottom quark flavor-violating interactions of the ALP through the low-energy FCNC processes of B mesons and at high-energy hadron colliders.We investigate the low-energy constraints from recent B meson decay measurements and the mixing of neutral B mesons.We also explore the search potential of the flavor-violating couplings of heavy ALP to bottom quark at LHC and its upgrades.Our main results are summarized as follows.
• For light ALP with m a ≲ m B , assuming promptly decaying ALP, the B meson FCNC decays constrain the product c V,A bq /f a BR(a → X) as low as the level of 10 −8 (10 −10 ) • The B meson oscillations constrains the heavy ALP regime.The bounds highly depend on the assumption of the c V,A bq parameters.The B d − Bd mixing places more stringent constraints on c V,A bd than the B s − Bs mixing for c V,A bs .
• The FCNC search for invisibly decaying ALP at hadron colliders cannot reach the parameter space of c V,A bq /f a and m a preferred by the B meson oscillations.
• The exclusion limits of the FCNC search for the ALP decaying to dimuon at hadron colliders are more stringent than those from a → inv.channel by one order of magnitude.The HL-LHC and FCC-hh are able to probe the parameter space preferred by the B meson oscillations.Especially, for m B ≲ m a ≲ 1 TeV region, the preferred FCNC c V bs coupling should already be probed or excluded by the search of bj+ dimuon at current 13 TeV LHC.
• The exclusion limits of the FCNC search for the ALP decaying to diphoton at hadron colliders are close to those from the dimuon channel.
A The BDT score distribution and cut efficiency at HL-LHC and FCC-hh A.1 a → inv.

Figure 2 .
Figure 2. Upper limits from BaBar search for B + → K + a(→ γγ).The coloured regions are excluded and the gray shaded regions have been excluded in the analysis due to the vicinity to the π 0 , η, and η ′ meson resonances.

. 3 )
We compare the results of both b−d−a couplings c V (A) bd (solid lines) and b−s−a couplings c V (A) bs

Figure 4 .
Figure 4.The representative Feynman diagrams of the process p p → b j a induced by b − d − a couplings at the LHC.The label "ax" denotes ALP a and q (q) = u, c, d, s (ū, c, d, s).

Figure 5 .
Figure 5.The parameter-independent cross section of p p → b( b) j a as a function of m a with √ s = 13 (blue), 14 (orange) and 100 (purple) TeV, respectively.For comparison, we show two cases of FCNC with the ALP only coupled with b quark and d quark (solid line, c V (A) bd ) and only coupled with b quark and s quark (dash line, c V (A) bs ), respectively.The renormalization scale and factorization scale are taken as µ R = µ F = √ ŝ/2 with √ ŝ being the partonic c.m. energy.LHC with √ s = 13 TeV are σ jjZ = 31.29 pb, σ bbZ = 7.39 pb, σ jjW = 100.77pb, σ bbW = 2.48

Figure 6 .
Figure 6.The BDT response score distribution of signal bj + inv.(red) and total SM background (blue) with m a = 10, 50, 300 and 1000 GeV (from top to bottom) at LHC with √ s = 13 TeV and L = 300 fb −1 .The "sig" or "bkg" in the legend represents the signal or the sum of SM backgrounds.The grey dashed line indicates the BDT cut that maximizes the significance with fixed |cV (A) bd|/f a = 1 TeV −1 (left four panels)

Figure 8 .
Figure 8.The Shapley values of the kinematic observables which are considered to train the BDT classifier with the mass benchmarks m a = 10 GeV (left) and 1000 GeV (right) for the decay channel a → inv.(top panels), a → µ + µ − (middle panels) and a → γγ (bottom panel) at LHC.Only ten observables with the highest Shapley values are shown.The "MET" represents the transverse missing energy.

Figure 9 .
Figure 9.The BDT response score distribution of signal bjµµ (green) and total SM background (brown) with m a = 10, 50, 300 and 1000 GeV (from top to bottom) at LHC with √ s = 13 TeV and L = 300 fb −1 .The grey dashed line indicates the BDT cut that maximizes the significance with fixed |c V (A) bd |/f a = TeV −1 (left four panels) or |c V (A) bs |/f a = 1 TeV −1 (right four panels).
(3.7) for the signal bjγγ and SM backgrounds at LHC with √ s = 13 TeV and L = 300 fb −1 .The benchmark masses are m a = 10, 50, 300 and 1000 GeV and the parameter is fixed as |c V (A) bd |/f a = 1 TeV −1 (above the double line) or |c V (A) bs |/f a = 1 TeV −1 (below the double line).

Figure 13 .
Figure 13.The distribution of the invariant mass of the muon pair in pp → bjγγ after applying the BDT cut for the benchmarks m a = 10, 50, 300 and 1000 GeV at LHC with √ s = 13 TeV and L = 300 fb −1 .The parameter is fixed as |c
(HL-LHC) and Fig.18(FCC-hh).The efficiencies of signal and backgrounds after applying the BDT-cut are shown in

Table 1 .
[93,94]isible decay modes of B mesons and the corresponding measurement or bounds on NP contribution.The upper bounds have been obtained by subtracting the lower bound of the SM prediction from the experimental branching ratio following the procedure in Refs.[93,94].

Table 5 .
The BDT cut, cut efficiencies and achieved maximal significance in Eq.

Table 6 .
The BDT cut, cut efficiencies and achieved maximal significance in Eq.